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INTRODUCTION 


The voluminous literature on transfer reveals that a variety of ways 
have been used to give quantitative expression to this phenomenon. For 
a good many years attempts have been made to measure the effects of 
training in one activity upon training in another, rather than the simpler 
determination of the existence and direction (positive or negative) of 
such effects. Nevertheless these quantitative expressions, because of 
their variety, do not enable one to compare the amount of transfer re- 
sulting under different experimental conditions in any standard or sys- 
tematic way. In many cases, of course, such a comparison may be made 
by using the raw cata which are reported. In some instances, compari- 
son of degrees of transfer cannot be made directly because of the fact 
that the learning involved has been measured by means of quite different 
operations, such as counting the number of trials to reach a criterion, 
measuring the time required to perform some activity, or counting the 
frequency of errors. 

The survey on which the present paper is based has revealed that 
transfer has been expressed in several typical ways, with minor varia- 
tions in each. These expressions are: 


1, A raw score, either amount of learning or amount of improvement; 
2. Per cent of improvement over some particular stage of the learning of a 
control group; 


3. Per cent of contribution of the transferred task to the total learning of a 
final task; 
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4. Per cent of savings in learning a final task attributable to having learned 


an initial task; 
5. Per cent of improvement to be expected in some given number of trials 


of direct learning. 
6. In addition, the presence or absence of transfer, but not its amount, has 
been indicated by the use of coefficients of correlation. 


FACTORS PRODUCIN G VARIATION IN MEASURES OF TRANSFER 


There are a number of reasons for variation in the expression of 
amount and direction of transfer which have their origin in the experi- 
mental situation itself. Some of these factors are discussed in the follow- 
ing paragraphs. 


Type of Measure of Learning Employed 


Two general types of learning measures may be distinguished. The 
first type, exemplified by frequency of correct responses or by amount 
performed within a given time interval, yields numerical values which 
increase as learning proceeds. When such measures are used to deter- 
mine the degree to which a given final task has been affected by training 
on an initial task, the absolute value of the learning score on the final 
task may be compared with the score of a contro! group to indicate 
directly the amount of transfer. The comparison is in most cases facili- 
tated by subtracting, algebraically, the.score of the control group on the 
final task from the score of the transfer group on this task. The resulting 
measure of transfer exhibits a positive increase in value with greater 
degrees of positive transfer and a negative increase in value with an 
increase in amount of negative transfer. 

It should be emphasized that the utilization of raw score values to 
express transfer is a procedure which has a number of advantages, chief 
among which is precision of meaning. The reader has little difficulty in 
finding evidence of transfer in a comparison of raw performance scores. 
On the other hand, it is not unusual for more elaborate expressions to 
lead to spurious quantitative reasoning. The most obvious limitation 
of raw scores, which most other transfer expressions attempt to over- 
come, is that they do not enable the investigator to make a direct com- 
parison of the amounts of transfer obtained in studies in which different 
tasks have been employed. 

In the case of learning measures which are increasing functions of 
amount of practice, transfer is often expressed not simply as a positive 
or negative amount by which the learning score of the transfer group 
exceeds that of the control group, but the percent (positive or negative) 
which this difference is of the control group’s learning score. In other 
words: 

T Group Score — C Group Score 


Per cent improvement = xX 100. [1] 
C Group Score 
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It is evident that such an expression is of no greater value in making 
comparisons with other studies than is the raw score value itself. This 
is because the value of the obtained percentage is strictly dependent 
upon the particular units in which the raw score on the task in question 
is expressed. Thus one cannot say that the amount of transfer obtained 
in an experiment yielding the value of 12—10/10 for Formula [1] is 
the same as the amount obtained in an experiment yielding 24-20/20, 
because (unless the tasks have been equated) it is not possible to main- 
tain that a unit of improvement in one case is equal to a unit of improve- 
ment in the other. Direct practice may be capable of bringing about 
an improvement from an initial score of 10 to a score of 100 in the first 
task, whereas the amount of improvement possible in the second may 
be only from an initial score of 20 to a score of 40. For these examples, 
obtaining a value of 20 percent transfer in each case is of ambiguous 
significance. 

The second type of learning measure is exemplified by such variables 
as number of errors, time of response, and number of trials to learn to a 
given criterion. In these cases, the numerical value decreases with an 
improvement in performance. However, if the amount of transfer is 
expressed as control group score minus transfer group score, the result- 
ing measure is comparable to that obtained with measures of the type 
discussed in the previous paragraph. Again a greater positive value is 
obtained as the amount of positive transfer increases, and a greater 
negative value ac the amount of negative transfer increases. In this case 
too, transfer has often been expressed as a percentage, according to 
the formula: 


C C Group Score — T Group Score 
Per cent improvement = —— X 100. [2] 


Cc Group Score 





It is evident that such an expression does not avoid the difficulty 
of being dependent for its value upon the size and nature of the units in 
which the learning is measured, and thus of being impossible of direct 
comparison with degree of transfer measured in other experiments by 
the same method, but on different tasks. 


Degree of Learning of Final Task 


This factor has considerable bearing upon the method of expressing 
transfer. The most commonly occurring situation is one in which the 
learning of the final task is measured, in both control and transfer 
groups, in its initial stages only. This is the usual situation when what 
is called a “test” of transfer is employed; i.e., the particular activity 
under consideration is measured during one, or perhaps two or three, 
initial trials. Sometimes this measurement has been preceded by one or 
two trials in the same activity conducted before training in the initial 
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task to obtain a score used in matching the groups. In any case, it may 
be said of such experiments that they are concerned with the measure- 
ment of transfer during the initial stages of learning. With such a pro- 
cedure, the possible methods of expressing transfer are limited to the 
statement of degree of transfer as: (a) the difference in raw score of 
control and transfer groups; or (b) the difference in raw score expressed 
as a percent of the control group’s score (Formulas [1] and [2]). 

When learning is measured only in its initial stages, it is evident that 
transfer cannot usually be expressed as a proportion of some score which 
represents the accomplishment of total learning. There are, however, 
some situations in which a measure of total learning of the final task 
may be inferred rather than measured directly. This would be the 
case, for example: (a) when the measure of learning is number of res- 
ponses correct, as in the learning of a list of 12 nonsense syllables; (b) 
when the measure of learning is number of trials to reach a criterion, and 
it can be assumed that complete learning means reaching a certain 
standard of proficiency in one trial (or zero trials); or (c) when matched 
tasks have been employed, and the total improvement possible on the 
final task can be inferred from the same measure in the initial task. In 
these instances, other means of expressing transfer are available and are 
to be described. 

Some studies, which will be referred to specifically in a subsequent 
section, have been concerned with measuring the amount of transfer of 
training to more than one stage of the learning of the final task. In most 
cases, these studies have presented such data in terms of raw scores. 
The attempt to represent transfer as percent of improvement illustrates 
at Jeast one of the shortcomings of this particular measure. For example, 
suppose the learning of the final task has been measured at three differ- 
ent points: during the initial stage, in the middle, and in the final stage. 
Suppose the scores representing goodness of performance in the control 
group were found to be 10, 20, and 30, for these three portions of the 
learning process, while those of the transfer group were 14, 24, and 32. 
It is easy to see that the transfer group performed better than the con- 
trol group throughout the learning. But how shall the ‘‘percent of im- 
provement” attributable to transfer from an initial task be expressed? 
Following the inethod of Formula [1]: 


14 — 10 

Per cent improvement, initial stage = as = 40% 
: : 24 — 20 

Per cent improvement, middle stage = "Tag = 20% 
32 — 30 

Per cent improvement final stage = ———— = 7%, 
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It would be fallacious to draw the conclusion from these figures that the 
effectiveness of the initial task in aiding the learning of the final task, 
while high at first, decreased progressively as learning proceeded. The 
differences in score of the two groups decrease, of course, because in 
both cases learning is approaching its final limit. Unless the negatively 
accelerated portion of the learning curve is taken into consideration, it 
is unjustifiable to make comparisons between the percent of transfer 
obtained by this method in different stages of learning. 

Another way of expressing transfer as a percent of direct learning 
has been used in several studies in which learning of the final task has 
been measured at one or more points beyond the initial stage. In this 
case, the transfer found is expressed as a percent of the amount of 
learning accomplished by direct practice in a given number of trials. 
The formula for this expression may be stated as follows: 


T Group Score — Initial C Group Score 
Per cent transfer = —— 


C Group Score on Trial X — Initial C Group Score 
Xx 100. [3] 





Although this expression has not been widely used, it has strong sup- 
porters (25, 68, 121). 

In order to consider concretely the type of measure which results 
from the use of this formula, let us take a fictitious example to represent 
the kind of data which might be expected in a transfer experiment in 
which the measure of performance is number of correct responses after 
0, 10, 20, 30, 40, 50, and 60 trials of direct practice on the final task, and 
in which amount of transfer is measured after equivalent numbers of 
trials on the initial task. Suppose the data are: 


Trials 0 10 20 30 40 50 60 
Direct Practice Score 0 9 13 16 18 19 20 
Transfer Score 0 1 2 3 4 5 6 
Percent Transfer 0% 11% 15% 18% 22% 26% 30% 


(Formula [3]) 


The percent transfer as computed by Formula [3] is shown in the final 
row of the above table. It is evident that this percentage is a measure 
which varies with the values of two other measures, namely, the amount 
of learning and the amount of transfer. Thus, the percent transfer found 
after 10 trials in the initial task would have been different from 11 per- 
cent if either the amount learned by direct practice in 10 trials were a 
different value, or if the amount of transfer found were a different value. 
The percent value itself does not enable us to determine which of these 
other variables has been most influential in producing the given transfer 
measure. Consequently, although the amount of transfer in this par- 
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ticular example increases linearly with the number of trials of practice 
on the initial task (as shown by the raw score values which increase by 
one point with each additional 10 trials of practice), the values of per- 
cent transfer do not give us any indication of this linear relationship. 
Transfer has been expressed as a proportion of the amount that learning 
can accomplish in given numbers of trials, but not as a proportion of the 
amount that total learning can accomplish. 

The expression of transfer obtained from Formula [3] is no doubt 
useful in certain practical instances. One of these, for example, would be 
a situation in which the number of trials which could be devoted to 
training in a given task were limited to a given value. In such a case, 
it would be desirable to know to what extent transfer accomplished the 
amount of direct learning which could take place in this particular num- 
ber of trials. 

One limitation of this manner of computing percent transfer seems 
fairly evident. This measure does not permit us to compare the percent 
transfer found on one task with the percent found on another, except 
through the arbitrary standard of number of trials. We cannot tell 
what degree of learning attained by a given number of trials in one 
task corresponds to what degree of learning attained by the same num- 
ber of trials in another task. The percent transfer found by this formula, 
for example, may be 60 percent on one task in which 10 trials accom- 
plishes only a small amount of the total possible learning; on another 
task, the percent transfer in the same number of trials may be only 20 
percent because of the fact that 10 trials of direct practice have been 
sufficient to bring the learning to a stage approaching completion. In 
contrast, a measure which related amount of transfer to the total pos- 
sible improvement in learning would make it possible to compare the 
percent that transfer was of the total learning in different tasks, regard- 
less of the number of trials which such total learning required in each 
case. 

Formula [3] is transformed in the usual manner when a measure oi 
learning is used which is a decreasing function of degree of practice. 
We may therefore express Formula [4] as: 


Initial C Group Score — T Group Score 





Per cent transfer = —— - 
Initial C Group Score — C Group Score on Trial X 


x 100. [4] 


The measurement of the total learning process in each of the two 
groups provides another way of giving direct expression to amount and 
direction of transfer. The amount of transfer may be expressed as a 
percent of the total learning. The zero point cf such a scale is deter- 
mined by the initial learning score of the control group in which no 
transfer has occurred. The 100 percent point is determined by a meas- 
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ure or estimate of the limit of learning. The percent of transfer thus 
expresses the degree to which transfer, as distinguished from direct 
practice, has contributed to total learning. Suppose the score made by 
the control group on an initial trial of the final task is 30, while that of 
the transfer group on the same trial is 45. Suppose also that the limit 
of learning of this task is known to be 50. If the score of the transfer 
group had been 30, we should have said the amount of transfer was 0; 
if it had been 50, we should have said the amount of transfer was 100 
percent. As it is, the percent of transfer may be expressed as the percent 
contributed to the total learning by transfer. That is: 


T Group Score — C Group Score 
Per cent transfer = — x 100. [5] 


Total possible Score — C Group Score 





In the example given, this would be: 
Per cent transfer = ———— = 15/20 = 75%. 


In contrast to the expression of transfer as ‘‘percent improvement,”’ 
this measure makes use of a scale of improvement the limits of which are 
determined. Although it is probable that the units of this scale are 
unequal, because of the typical negative acceleration of learning, never- 
theless the expression permits a comparison of degree of transfer ob- 
tained in different studies. Thus, while it is not possible to maintain 
that the amount of increase in transfer represented by an increase of 50 
percent to 55 percent is equivalent to an increase from 70 percent to 
75 percent, it is possible to state that 75 percent transfer is greater than 
70 percent, or that 55 percent is greater than 50 percent. 

This manner of expressing transfer has been utilized most frequently 
in studies in which percent savings of trials or errors is measured. In 
such cases, the sign of each term must be reversed, because these learn- 
ing measures decrease in value with greater learning. The formula 
becomes: 


Per cent transfer C Group Score — T Group Score 
(savings) ~ AR 





. xX 100. [6] 
C Group Score — Total Possible Score 


For example, suppose the control group learns the final task in 30 
trials, whereas the transfer group requires only 10. Suppose also that 
it is known, in the case of this activity, that the smallest number of 
learning trials which could be used to achieve a perfect score is 0 (or 
1, if it is measured, rather than assumed); i.e., if previous learning or 
transfer has been perfect, the number of additional learning trials needed 
to reach the criterion of complete learning is 0. (This is the usual case, 
though it is possible to conceive of examples in which the number of 
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trials would be greater than 0.) In terms of trials, then, the total pos- 
sible learning score is zero, and the expression becomes: 


30 — 10 
Per cent transfer (savings) = wre = 20/30 = 67%. 


It is evident in this case, too, that a scale of improvement units is pro- 
vided whose zero point is fixed by the score of the control group, and 
whose 100 percent point represents total possible learning. 

It may be noted that this formula differs from the one used in com- 
puting ‘‘percent improvement”’ in its possession of the term representing 
total possible learning. In the case of learning measures which decrease 
with greater learning, such as number of trials or number of errors, the 
expressions sometimes reduce to the same numerical value, because of 
the zero term in the denominator which represents total possible learn- 
ing. When the learning measure is number of trials, the assumption that 
zero trials represents total possible learning is ordinarily a good one; 
when learning is measured by means of number of errors, there are 
more cases in which this assumption cannot be made; when time of 
response is measured, it is clear that the assumption of zero time as 
total possible learning is unjustified. Consequently, it would seem best 
to use the total expression containing the term which represents total 
possible learning, and to state that this term is zero when this assump- 
tion can legitimately be made. 

Formulas [5] and [6] are by no means so satisfactory when an 
expression for negative transfer is desired. It will be recalled that zero 
transfer is taken as the learning score of the control group. Negative 
transfer would thus be said to occur when the score of the transfer group 
is below this value, in the case of Formula [5], or above it, in the case of 
Formula [6]. But it is more difficult to answer the question of how much 
below, or how much above, in each case. The point of 100 percent nega- 
tive transfer is difficult to establish. To correspond to a point of “total 
poss.vle learning,’’ we need a point of “total possible interference.”’ 
The experimental operations for the determination of such a point have 
not as yet been devised. Consequently, using either of these formulas 
to derive an expression for percent of negative transfer leads one into 
the same difficulty experienced in the case of the measure of “percent 
improvement’’: the possession of a scale whose upper limit is undeter- 
mined. For example, if we found that the control group made a learning 
score of 40 (number correct) on the final task, while the transfer group 
scored 5, the use of Formula [3], with 50 as a total possible learning 
score, would give us: 


So a ae 
50-30 20 





Per cent transfer = = — 125%. 
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The use of these formulas ({5] and [6]) to express negative transfer 
means, in effect, that one is finding the percent that amount of inter- 
ference is of total possible positive improvement or positive transfer. 
Although the meaning of such an expression is obviously more involved 
than one which would express amount of negative transfer as a percent 
of the total possible negative transfer, it may nevertheless have con- 
siderable usefulness for both theoretical and practical reasons. It will 
be noted that the value of such an expression may, at least theoretically, 
exceed 100 percent. If this happened, it would mean simply that the 
total possible amount of interference with an activity was greater than 
the total possible degree of learning of that activity. We do not know 
whether such a situation is possible or not. At any rate, this manner of 
expressing negative transfer does enable one to compare the amounts 
obtained in different studies, provided its meaning is clearly understood. 
Even more important than this, perhaps, is the fact that the expression 
permits a direct comparison of degree of positive transfer with degree of 
negative transfer on the same task, since both amounts are put on the 
same scale. 


Experimental Design 


Another factor which influences the way in which transfer is ex- 
pressed quantitatively is the particular experimental design employed. 
Woodworth (133) has summarized the five major variations in the de- 
sign of experiments on transfer. Three of these plans involve the use of 
matched groups of subjects, while the other two depend upon some 
means of matching the tasks. 

In the first of these designs (Plan 1), the following procedure is 
followed, quoting from Woodworth (133, p. 178): 


Practice Group: Foretest on B...... Practice on A...... After-test on B 
Control Group: Foretest on B...............00e cee eeeee After-test on B. 


The two groups are matched, in this sort of experiment, on the basis of 
the foretest. So far as the expression of transfer is concerned, this design 
imposes the following limitations: 


1. Except for raw score measures, degree of transfer can usually be computed 
only by the use of Formulas [1] and [2], which yield ‘‘percent improvement” and 
its attendant limitations, as discussed in the previous section. No expression for 
percent transfer (percent of total possible improvement) can ordinarily be 
obtained, because the experimental design makes no provision for the measure- 
ment of total learning. Of course, in some studies which have used this design, 
such a measure may have been obtained outside the experiment proper, or it 
may have been inferred, as could be done if the learning task were twelve non- 
sense syllables, for example. But in the case of many studies, it may be said 
that Formulas [5] or [6] could not have been employed because no measure of 
total learning was obtained. The after-test usually represents a relatively early 
stage of the total learning process. 
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2. With this design, some of the improvement in score which is found in the 
final task is attributable to the practice in that task occasioned by the foretest. 
Consequently, when Formulas [1] and [2] are used with this experimental de- 
sign, they are usually modified in such a way as to yield percent improvement 
due to transfer. 


The consequent modification of Formula [1] then becomes: 
Percent Improvement 


(" Group Pay ( Group pr? 


Score Score Score Score 
* xX 100 [la] 
Foretest Score 








T Group Score — C Group Score 
xX 100 


Foretest Score 


Thus, the zero point of this scale of improvement is taken to be the 
initial (foretest) score of the control group, as it is in the case of For- 
mula [1]. The foretest score of the transfer group is exactly the same as 
that of the control group, since this has been used as a matching cri- 
terion. Formula [2] can be similarly modified, since it differs only in 
sign. In addition, there have been a number of other specific procedures 
used in arriving at this measure of percent improvement which will be 
mentioned in a subsequent section. Their outstanding limitation is the 
fact that percent improvement is a measure which is dependent upon 
the raw score units, and does not permit a comparison with the percent 
improvement obtained with other tasks. 

A second design which utilizes matched groups is stated as follows 
by Woodworth (Plan 4): 


Deemeeer wroup learms A... . 5. 5b see once oe learns B 
A Rd Baek iy Bane learns B. 


It is clear that the major difference between this and Plan 1 is the fact 
that in both groups learning of the final task is carried on beyond the 
initial stage implied by a “‘test.”” Consequently, transfer may in this 
instance be expressed not only as “percent improvement” (Formulas 
[1] or [2]) over the initial score of the control group, but as percent which 
transfer contributes to some particular degree of learning. Our previous 
discussion indicates that the latter expression may, if it is desired, be 
chosen to represent percent that transfer is of the direct learning accom- 
plished in a given number of trials, according to Formulas [3] or [4]. 
However, the more useful measure, using Formulas [5] or [6], would 
seem to be that in which transfer is expressed as a percent of contribu- 
tion to total possible learning. Obviously the use of these formulas in 
this case requires that the learning of the final task be continued to the 
point where a measure of total learning can be obtained. When such 
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measures of learning as number of syllables correct out of 12, or number 
of trials needed to reach a criterion, are used, it may be possible to 
assume, rather than measure directly, the value of total possible learning 
for substitution in Formulas [5] or [6]. 

A third experimental plan which has sometimes been employed is 
Woodworth’s Plan 5 which he states as follows: 
ORIN. 6.2 insrdics vss 9 <chOl e ad 4% led ee id te SRO Learns B. 
Group II learns B...........Learns A (the data from A and B being pooled). 


This method also requires the matching of groups, since it is necessary 
to insure that ability to improve is the same in both. One may think of 
this plan as one in which Groups I and II serve alternately as transfer 
and control groups. When so conceived, it is not essentially different 
from Plan 4. The usefulness of combining or pooling the data from both 
groups may be questioned, since the result is to contaminate the measure 
of the initial learning score in either group. The procedure seems par- 
ticularly questionable for the reason that in many instances there are no 
a priori reasons to expect that the transfer which takes place from Task 
A to Task B should be at all the same as the transfer which takes place 
from B to A. A full discussion of this point is, however, beyond the scope 
of the present paper. This plan provides the same possibilities for meas- 
uring transfer as does Plan 4, since it involves the learning of a final 
task up to any desired stage, including the stage of total learning. For- 
mulas [5] and [6] may consequently be employed when this plan is used, 
as well as Formulas [1] and [2], or [3] and [4]. 

The remaining plans for experimental design both involve the 
matching of tasks rather than the matching of groups. According to 
Woodworth, they are stated as follows: 

i ek nee 5 « nus b Ahab eae After-test on B. 
ce , _ Yr ree oy by re ere learns B. 


In both these plans, no control group is necessary because the two 
tasks are matched in difficulty and consequently the same degree of 
improvement resulting from a given amount of practice can be expected 
oneach. The initial score on Task B without transfer can therefore be 
taken to be equivalent to the initial score on Task A, and the substitu- 
tion of these values can be made in the appropriate places in the formu- 
las. The expressions of transfer which may be used with these plans are 
determined by the extent of practice given on the initial and final tasks. 
If the amount of practice on Task A is limited to the initial stages of 
learning under Plan 2, percent gain may be computed according to 
Formulas [1] or [2]. If the practice on Task A is carried on in some par- 
ticular number of trials, Formulas [3] and [4] may be used with either 
plan. The use of Formulas [5] and [6] is possible when the learning of 


either Task A or Task B has been carried tc a stage approaching com- 
pletion. 
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Other Measures of Transfer 


Two other methods of expressing amount of transfer have been used 
relatively infrequently. A few comments may be made on each of these 
methods. 

Standard scores. In some studies, the amount of gain resulting from 
transfer has been expressed as an increase in sigma units above the mean 
of the scores on the initial test, when experimental Plans 1 or 2 have 
been followed. This sort of measure seems to have no greater value than 
the gain expressed in terms of raw score or expressed as a percent accord- 
ing to Formulas [1] or [2]. Obviously, the standard score unit is a mea- 
sure of the degree to which transfer has raised the score obtained on the 
initial test, in terms of the variability of performance on this initial test. 
It does not relate transfer to the amount of improvement to be expected 
in the task under consideration. Consequently, it is comparable in 
meaning to the value obtained from Formulas [1] or [2] which express 
the gain in terms of the initial score. Since a measure of variability of 
performance in the initial test provides us with no means of estimating 
the improvement to be expected in the given task, it is impossible to 
relate transfer to amount of improvement attained by some degree of 
direct practice. Although the standard score makes the value of the 
transfer obtained independent of the variability in different tasks, it 
does not prevent its variation with the amount of improvement to be 
expected in different tasks. Consequently, quantitative comparisons of 
amount of transfer between two tasks are not possible with this measure. 

Coefficients of correlation. In some studies, evidence of transfer is 
found by comparing the value of the coefficient of correlation measured 
initially with the correlation between these tasks after training has been 
given on one of them. It seems evident that, if the value of the correla- 
tion coefficient increases after training in one of the tasks, positive trans- 
fer is indicated; if a decrease is found, negative transfer is indicated. 
This method has been used particularly in studies which have concerned 
themselves with the theory of identical elements. Presumably, the 
degree of correlation found would be expected to increase with the 
number of identical elements present in the two tasks under consider- 
ation. It is difficult to see how this measure can be used to give a direct 
quantitative estimation of the amount of transfer, since the coefficient 
itself does not provide a measure of the amount of improvement to be 
expected as a result of direct practice on the final task. It is thus subject 
to a limitation similar to that mentioned for standard scores, namely, 
that it does not permit relating the amount of transfer to the amount of 
improvement possible in the task, as measured experimentally. In most 
cases, the studies which have employed this measure have also pre- 
sented their results in other terms, such as raw scores, which make 
possible the relation of amount of transfer to some degree of direct 
practice. 
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CRITERIA OF TRANSFER MEASURES 


Our discussion up to this point has indicated that a number of factors 
may influence the type of expression employed to quantify data eb- 
tained in transfer of training experiments. In particular, the measures 
of learning which are used, the extent of the learning which is measured, 
and the experimental design which is followed, all provide a number of 
possible limitations to the measure of transfer which can be employed. 
It would seem desirable, then, to attempt to state some criteria which 
measures of transfer must meet if certain purposes are to be fulfilled, in 
order that experiments in this field may be designed with full apprecia- 
tion of the limitations of each measure. This section will attempt to 
summarize the desirable characteristics of transfer measures which have 
become evident from our consideration of the limitations imposed by 
experimental factors. 

A measure of transfer may be desired for either or both of two major 
purposes. One of these may be called a theoretical purpose. That is, 
measures of transfer may be obtained primarily for the purpose of re- 
lating the experimental findings to some hypothesis or general theory of 
learning. A second purpose for which the measure of transfer may have 
been obtained is, of course, a practical one. Transfer has been measured 
primarily for this purpose in a great many of the studies which are re- 
ferred to in a subsequent section of this paper. Broadly speaking, the 
practical reason for obtaining a measure of transfer is to determine to 
what extent a particular kind of training affects either subsequent train- 
ing or subsequent proficiency in some learned activity. It is conceivable 
that the measures of transfer obtained for each of these purposes might 
entail quite different criteria of usefulness. On the other hand, it is 
possible that many characteristics of the transfer expression might find 
equal usefulness for both theoretical and practical purposes. The dis- 
cussion which follows is an attempt to describe the features of transfer 
measures which seem desirable for each of these purposes and to discover 
whether common characteristics may exist for both. 


Transfer Measures for Theoretical Purposes 


In recent years, there have been a number of attempts to relate the 
phenomenon known as transfer of training or proactive facilitation and 
inhibition to a more basic learning theory. McGeoch (80) points out 
that primary generalization may be considered to fall within the general 
class of events which are known as transfer of training, and discusses the 
possibility of deriving a theory of transfer based upon this fundamental 
concept. A number of articles by Spence (111, 112, 113) have indicated 
the possibility of relating transfer, as it is exhibited in transposition be- 
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havior, to the phenomena of generalization and discrimination. In 
studies of the learning of paired associates, Gibson (46) has shown that 
the phenomena of retroactive inhibition and transfer may be derived 
from a hypothesis based upon the concepts of generalization and dis- 
crimination. Without attempting to review these studies specifically, it 
may be pointed out that if the various specific learning situations in 
which transfer of training takes place are to be related to generalization, 
a primary characteristic of transfer measures which would seem desir- 
able is that they should be expressed either in the same terms as general- 
ization itself, or that the measure employed should be readily trans- 
formable into values which are also meaningful measures of generaliza- 
tion. 

Generalization is ordinarily measured in such a way that it can be 
related directly to a measure of direct learning. One usually finds it 
necessary, for theoretical purposes, to state the degree to which the 
generalized response measures up to a standard provided by the origi- 
nally conditioned response. This is often done simply by comparing the 
actual raw score values of the conditioned and the generalized responses. 
In some instances, the amount of generalization is stated as a percentage 
of the originally learned response. This method is used by Anrep (2) to 
report the generalization of conditioned reflexes. Another example is 
Gibson’s study (47), in which degree of generalization of responses made 
to nonsense forms is given as a percent of the responses to be expected 
to a set of standard figures which, if perfectly learned, would have the 
value of 100 percent. A similar expression is found in an earlier study 
by Yum (136). For theoretical purposes, it is often convenient to relate 
degree of generalization to some measure of total possible learning or to 
the “physiological limit.’’ Hull’s (62) statement of Major Corollary I to 
Postulate 5, dealing with generalization, contains the term M, which 
represents the physiological limit of the learned response. In this par- 
ticular instance, for convenience in computation, M is assumed to be 
100. A number of other examples (cf. 59) could be cited to illustrate the 
utility for theoretical treatment of expressing amount of generalization 
as some proportion or percent of the strength of the originally condi- 
tioned reaction established by direct practice. 

The relation of a measure of transfer to degree of learning attained 
by direct practice may be accomplished, first of all, by comparing raw 
scores, provided that these scores include some measure which permits 
the estimation of total possible learning. A second way of expressing 
transfer which permits comparison with generalization measures is to 
state the percent which the response strength resulting from transfer is 
of the total response strength possible in the particular situation. Evi- 
dently, then, this criterion of theoretical usefulness would limit the ex- 
pression of transfer to either a raw score value or to expressions obtained 
from Formulas [5] or [6], in which transfer is measured as a proportion 
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of the total possible learning. In either case, the data obtained could 
readily be utilized to compute the measure of generalization which is 
desired, or for substitution in a mathematical expression which includes 
total possible learning (physiological limit) as a term. In some instances, 
it might be desirable to express transfer as a proportion of the strength 
of the learned response in a given number of trials, according to Formu- 
las [3] and [4]. 

For theoretical purposes, it would seem desirable at the present 
stage of our knowledge to be able to relate the amount of transfer ob- 
tained on one kind of learning task to that obtained on another kind. 
Since learning tasks differ in difficulty, it is necessary to utilize some 
measure by means of which they can be related, and this is provided by 
a measure of total possible learning. For purposes of theory, it is in 
many instances of relative unimportance that a certain task requires 20 
trials for the response to reach some criterion of strength, and that an- 
other task requires 40 trials. If the learning on both of these tasks is to be 
taken to indicate certain general characteristics of the learning process, 
what is important is the amount of response strength as a function of 
successive proportions of the total learning process. It would, of course, 
be desirable to obtain all the measures of learning necessary for purposes 
of theoretical reasoning from a single task, but such data do not exist at 
present. Lacking them, it should nevertheless be possible to come to 
some conclusions concerning the factors which influence transfer, pro- 
vided that the different tasks on which transfer has been measured can 
be compared up to and including the stage at which learning is complete. 

Another reason why learning of the final task beyond an initial point 
is desirable in transfer experiments is that such a procedure provides the 
possibility of measuring the transfer effect at a number of stages in the 
learning process. This information is of considerable importance in 
formulating a complete theoretical account of transfer of training. One 
of the significant problems in present-day learning theory concerns the 
variation in degree of generalization as a function of the degree of learn- 
ing. If transfer is to be related to the concept of generalization, it seems 
necessary also to obtain measures of the amount of transfer at different 
stages of the total learning process. That is to say, systematic informa- 
tion is needed concerning the way in which amount of transfer varies 
with different amounts of practice on the initial task (cf. 14, 26, 60, 64, 
72, 84, 98, 106, 110, 128), and also concerning the way in which final 
learning is affected throughout its course by given amounts of initial 
training (cf. 6, 26, 30, 35, 63, 64, 66, 79, 132). 

In summary, it may be said that if a measure of transfer is to be of 
greatest usefulness for purposes of theoretical analysis, it should be 
secured under conditions in which some measure is obtained of the 
point at which learning starts, the point at which learning may be said 
to be complete, and of a number of intervening stages in the learning 
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process. As has been pointed out, the studies in which such measures ot 
learning have been obtained usually make possible the expression of 
transfer either as a raw score which can be compared with the raw scores 
representing amount of learning, or as a percent that the response 
strength produced by transfer is of the total response strength possible. 
Formulas [1] and [2] express transfer as a percent gain of some initial 
value which in itself provides no measure of the total possible learning. 
Formulas [3] and [4], although they do involve the relating of transfer to 
some degree of direct practice, do not specify this degree, but rather 
depend upon the measurement of learning in an arbitrary number of 
trials. Formulas [5] and [6], on the other hand, have two distinct ad- 
vantages: (a) they relate transfer directly to the total learning possible, 
or, in other words, to a measure comparable to “strength of originally 
conditioned response’’; and (b) they yield a measure of transfer which is 
independent of variations in the rate of learning of different tasks em- 
ployed in transfer experiments, and thus permit comparisons to be made 
between studies. 


Transfer Measures for Practical Purposes 


As we have pointed out, the general practical reason for obtaining 
measures of transfer is to permit a measure of degree to which the learn- 
ing of one activity influences the learning or performance of some subse- 
quent activity. In this case, too, it would seem desirable that the meas- 
ure of this effect be obtained for more than one stage of the final learn- 
ing process. A number of studies have shown that transfer may change 
both its value and its direction as one proceeds with the learning of the 
initial task and of the final task. It would surely be unfortunate if a 
study of transfer indicated that some degree of negative transfer could be 
expected from the final task when only the initial portion of the learning 
of that task had been measured, whereas a more complete investigation 
showed that when the final learning was considered in its totality, some 
positive transfer could be expected to occur. The results of such studies 
as have been performed on this problem (e.g., 6, 63, 64, 132) certainly 
indicate that the amount of transfer obtained when the initial portion of 
the learning of the final task is measured provides no reliable indication 
of the amount to be expected when transfer to some later stage of learn- 
ing is considered. There is, therefore, a very good reason for carrying 
the learning of the final task beyond an initial point, and, if possible, to 
the point where total learning can be said to have occurred. 

There are many praetical situations in which it is desirable to com- 
pare the amount of transfer produced by training on a given initial task 
to a number of different final tasks. Perhaps the best known example of 
this comparison is to be found in thoses studies which have attempted to 
measure the transfer of an academic subject such as Latin to learning or 
proficiency in a variety of other subjects. In the learning of motor skills, 
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too, one can find many instances of attempts to measure the amount of 
transfer from the learning of such a task as tracing a finger maze to the 
learning of other mazes. In these cases, the expression of transfer as per- 
cent gain over some initial score of a control group on the final task may 
encourage a comparison of one with the other which can only be mis- 
leading. The amount of possible improvement in a given final task, such 
as writing English compositions, may be quite different from the amount 
of possible improvement in another final task, such as reading compre- 
hension. Likewise, the amount of possible improvement in one maze 
selected for a final task may be different from the amount of possible 
improvement in another. Expressing transfer as simply the percent 
gain over an initial score in these tasks is an undesirable measure with 
which to compare degrees of transfer. Evidently, what is needed is a 
measure of transfer which takes into account the total improvement 
possible on the various final tasks under consideration. Therefore, it 
would seem necessary for purposes of comparison that the transfer be 
expressed as a percent of the total possible learning in each task. 

For practical purposes, it is desirable that the measure of transfer 
employed make as much sense as possible, not only to the investigator, 
but to others who are expected to translate the results of his experiments 
into terms of concrete action. The expression of transfer as percnte 
gain, while it may seem at first glance to have a perfectly obvious mean- 
ing, may, in a number of practical situations, be seen to have little. 
Usually, the practical man is concerned with arranging the conditions of 
instruction in some subject or task in such a way that training will be 
most efficiently accomplished. To provide him with the information that 
training on a certain type of initial task produces a certain percent of 
gain in a final task is a very incomplete answer to this practical problem. 
What he usually needs to know in such a situation is to what extent 
training on an initial task increases the efficiency of learning on a final 
task. A way of expressing this which is easy to understand is by stating 
that training on the initial task accomplishes a certain percent of the 
total learning necessary to bring the learner to a given stage of profi- 
ciency. Practical decisions can readily be made on the basis of such in- 
formation. For example, one can determine immediately which of two 
alternative initial tasks have produced the greater effect upon the learn- 
ing of a given final task, or one can determie to which of two alternative 
final tasks the transfer from a given initial task has been greater. Per- 
haps even more important, the expression of transfer as percent contri- 
bution to the total learning makes possible an immediate determination 
of whether the initial task has produced an effect which is greater than 
would be accomplished by an equal amount of effort spent in direct 
learning of the final task. It seems evident, therefore, that transfer as 
percent of total learning has a somewhat greater degree of practical 
meaningfulness than does the expression of transfer as percent gain over 
some initial learning score. 
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Summarizing, there seems to be a number of practical reasons for 
preferring a measure of transfer which relates amount of transfer to a 
measure of total possible improvement, to an expression which provides 
simply a statement of percent of gain. Formulas [5] and [6] may be 
said, therefore, to yield an expression of transfer which best fulfills the 
criterion of practical usefulness. Formulas [1] and [2], which provide an 
expression of per cent gain, are less desirable because they do not permit 
accurate comparison of amount of transfer in learning tasks which 
provide different degrees of possible improvement, and because the ex- 
pression of percent gain does not permit an answer to the practical 
question of the extent to which transfer contributes to the total learning 
process. Formulas [3] and [4] provide a measure of transfer which re- 
lates amount of transfer to some particular number of trials of direct 
practice. Consequently, they do not make it possible to compare the 
transfer obtained on different tasks at comparable stages of the total 
learning process. 


TRANSFER MEASURES EMPLOYED IN STUDIES OF TRANSFER 


The preceding discussion has indicated the major ways in which 
measures of transfer may be expressed. In the present section, the use 
of these expressions will be referred specifically to the studies in which 
they have occurred. These references are summarized under headings 
which correspond to the following classifications: 

1. Studies in which transfer has been expressed solely as a raw score value 
where this value represents either (a) amount of learning, or (b) amount of 
transfer as computed by obtaining the difference between a learning score and a 
transfer score; 

2. Studies in which a learning score has been expressed as a percent of total 
learning; 

3. Studies in which transfer is expressed as percent improvement over some 
initial score; 

4. Studies in which transfer is expressed as a per cent of the learning to be 
accomplished in a given number of trials; 

5. Studies in which transfer is expressed as a percent of the total learning 
possible; 

6. Studies which have used correlation coefficients as expressions of transfer. 


The review of the literature contained in the present section includes 
references to all relevant studies of transfer which have appeared sub- 
sequent to 1920. References are also given to a number of earlier studies 
which seem representative of the different approaches to the measure- 
ment of transfer, or which have contributed a distinctive method of 
measuring transfer. There are a number of articles which contain ex- 
tensive reviews of the literature on transfer of training (26, 28, 55, 87, 
93, 94, 95, 101, 105, 115, 122, 124). In addition, several references con- 
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tain discussions of the methods which have been used to measure trans- 
fer (24, 68, 94, 95, 105, 124, 133). 


Transfer Expressed as a Raw Score 


Many studies of transfer have reported only the raw scores attained 
in the learning of both the initial and final tasks. The raw scores may 
indicate simply amount of learning, or they may express amount of trans- 
fer as a difference obtained by subtracting the learning score of a con- 
trol group from that of an experimental group. 

Scores expressed as amount of learning. Studies of transfer which have 
expressed a raw score representing number of correct responses are to be 
found in references 8, 36, 41, 44, 52, 61, 73, and 76. In experiments on 
the transfer of conditioned responses (48, 50, 70, 71, 125, 126), the score 
reported represents the number of occasions on which the transferred 
conditioned response is exhibited. The raw scores attained on tests (112) 
usually indicate the number of correct responses, although in some in- 
stances the scoring formula may involve the subtraction of a weighted 
error score. Raw scores representing number of errors as a measure of 
learning are used in these studies: 27, 35, 52, 66, $1, 84, 90, 108, 116, and 
127. The time required to learn a given task is used as a raw score to 
indicate transfer in studies 27, 35, 66, 81, 119, and 127. Mean reaction 
time taken for responses in the final task is compared with mean reaction 
time in the initial task in a study by Siipola (108). Number of trials re- 
quired to reach a criterion is a raw score which is usually accompanied by 
other measures. It is used to show transfer in studies 11, 52, 61, 66, 76, 
81, 90, and 127. 

Scores expressing amount of transfer. The difference between the raw 
scores in experimental and control groups in a transfer experiment is 
frequently given as a net transfer score. The differences between the 
learning scores of these two groups in terms of number of correct re- 
sponses are utilized to indicate amount of transfer in studies 29, 31, 56, 
and 102. The scores achieved on intelligence tests and achievement 
tests are usually derived from a count of the number of correct re- 
sponses made by the subject. Differential gains in such test scores 
are reported as evidence of transfer in studies 13, 38, and 103. In a 
number of studies, learning scores express the mumber of reactions 
of particular sorts which occur within given time intervals. An in- 
crease of seven rotations in a clover-leaf maze is taken as a measure of 
transfer in a study of bilateral transfer (86). Differences in gain between 
control and experimental groups were calculated in two studies in which 
a variety of motor tasks was employed for initial and final learning 
(45, 74). Error score differences are found in studies 6, 63, 90, 100, 102, 
and 120. Differences in the time taken by the control and experimental 
groups to perform a given task or to make a given response are reported 
as evidences of transfer in studies 6, 30, 63, 72, 100, 120, and 132. Trans- 
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fer is indicated by the differences between number of trials required to 
learn to a given criterion in the two groups in the studies numbered 
63, 90, and 120. 


Expressions of Per Cent Learning 


Several studies report the percentage which a raw score is of total 
learning. Such a percent score represents neither percent improvement 
nor percent transfer, but simply the proportion of total learning which 
has occurred in each group. With this expression, no subtraction has 
been performed, but a comparison of the percent scores attained by the 
experimental and control groups must be made by the reader in order to 
find evidence of transfer. Despite superficial differences, most of the 
investigations employ the same measure of learning, i.e., frequency of 
correct response. 

Percent correct responses. In studying the transfer of discrimination 
of forms between a right and left eye, Franz and Layman (42) indicate 
transfer by comparing the percent of correct responses obtained with 
each eye. Karn and Patton (67) investigated the transfer of double al- 
ternation behavior in mazes identical except in size. These authors com- 
pare the percent of correct runs on the initial and two final mazes. Mc- 
Kinney (82) also uses percent correct responses as a measure of learning 
of discrimination habits in rats. In another study, McKinney (83) cal- 
culates the percentage of time the subject responds to an altered form of 
the stimulus as if it were the original. This author also suggests that a 
measure of variability indicating consistency in responding may be in- 
terpreted as a tendency to transfer. In a study by Gulliksen (15) sub- 
jects were trained to give two antagonistic responses to each of two 
classes of stimuli. In measuring the learning of the final task, the number 
of correct responses was divided by the total number of responses made. 
In this case, a score of 100 percent represents complete learning, and 
zero learning is indicated as 50 percent, which Gulliksen refers to as 
“ambiguous transfer.” 

Percentage of subjects responding. A learning score may be obtained 
by finding the percentage of subjects out of the total group who make a 
given response. Sears and Hovland (106) report the proportion of sub- 
jects responding with each of four modes of resolutions to incompatible 
avoidance responses. Studies of the transfer of conditioned responses 
frequently report the percentage of subjects exhibiting the new, or trans- 
ferred, conditioned responses (49, 107). Katona (68) demonstrates the 
effect of a number of different practice methods on the transfer to vari- 
ous tasks, using as a learning measure the “‘percent of perfect solution.” 
In some experiments, this author reports the percent of subjects attain- 
ing the three highest and the three lowest scores on a variety of tasks, 
following several methods of initial learning. Dorsey and Hopkins (39) 
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indicate transfer by reporting the percent of the transfer group which 
overlaps the mean score of the control group. Similarly, the study of 
Martin (78) on transfer in cancellation tests reports percentage of the 


practice group reaching or exceeding the average speed and accuracy of 
the control group. 


Transfer Expressed as Per Cent Improvement 


Our previous discussion has indicated the possibilities of variation in 
the expression of transfer as percentage of improvement over an initial 
score. In those studies in which increased learning is represented by an 
increase in score, as is the case when such learning scores are employed 
as number of responses correct, number of tasks completed in a given 
interval, or a standardized test score, the improvement is computed by 
subtracting the score of the control group from the score of the transfer 
group in the final task (65, 85, 98, 101). In such cases, percent improve- 
ment is found by dividing the amount of improvement by the initial 
score made by the control group. This method of expressing transfer has 
been referred to in the present paper as Formula [1]. If the experiment 
has included a foretest as well as an after-test, the improvement of both 
the control group and the transfer group may be found by subtracting 
each group’s initial score from its final score (7, 12, 22, 23, 37, 43, 69, 
88, 97, 117, 130, 131). Percent improvement due to transfer may then 
be obtained by substituting the appropriate values in Formula [la]. 

In some studies, percent improvement in each group is first obtained 
separately by dividing the improvement by the foretest score in each 
case. Percent improvement due to transfer is subsequently obtained by 
subtracting these two percentages (12, 88, 131). This method may be 
shown to be equivalent to that summarized by Formula [la], since the 
foretest scores of the two groups are equal, having been used for the pur- 
pose of matching groups. Thus the expression: 


After-test 7 — Foretest JT After-test C — Foretest C 





Foretest T Foretest C 


reduces to Formula [la], if it is assumed that foretest T equals foretest 
C. Several other variations of Formula [1a] are to be found. Archer (3) 
subtracts the mean improvement of the control group from the mean 
improvement of the experimental group, and then expresses this differ- 
ence as percent of an original learning score obtained in the foretest. 
Again this formula reduces to Formula [1a] when the foretest scores of 
the transfer and experimental groups are equal. Razran (98) and Winch 
(130) compute percent improvement according to Formula [1a] as it has 
been stated in this report. 

When the learning measure undergoes a decrease in value with in- 
creased goodness of performance, as is the case with number of errors, 
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time of performance, or number of trials required to learn, the same 
general method is used to compute percent improvement. In order to 
indicate positive transfer by a positive number and negative transfer by 
a negative number, the transfer group score is subtracted from the con- 
trol group score (16, 17, 20, 21, 24, 32, 64, 77, 79, 122, 134); or the after- 
test score is subtracted from the foretest score (3, 5, 9, 18, 23, 32, 34, 
40, 53, 57, 58, 79, 89, 91, 97, 104, 109, 121, 131). In the latter case, per- 
cent improvement due to transfer in excess of the direct practice pro- 
vided by the foretest may be obtained by subtracting the percent im- 
provement of the control group from the percent improvement of the 
experimental group. As has been pointed out previously, the formulas 
used to obtain percent improvement with this type of learning measure 
are [2] and [2a]. 

Minor variations of Formula [2] are found in a number of studies in 
which improvement is expressed as percent that the transfer group 
score is to the control group score, i.e., 


, T Group Score 
Per cent improvement = X 100 [2b] 
C Group Score 





(4, 10, 14, 15, 60, 99), or as a ratio of the after-test score to the foretest 
score (75). Actually, this formula yields an expression which is compa- 
rable to that given by Formula [2], except that the point of zero transfer 
is shifted from zero to 100. A value of less than 100 indicates positive 
transfer, and a value greater than 100, negative transfer. The value of 
the expression obtained from Formula [2] is the same as 100 percent 
minus the value obtained from this formula [2b]. That is: 


T Group Score C Group Score — T Group Score 
we = xX 100 


100 





x 100 
C Group Score C Group Score 





Britt (10) and Bunch (15) obtain their measures of transfer by sub- 
tracting the percentage which the transfer group score is of the control 
group score from 100, as indicated. This expression is therefore equiva- 
lent to that obtained by the use of Formula [2], in which zero transfer 
has a score of 0. 

The expression of transfer obtained by the use of Formulas [1], [1a], 
[2], and [2a] yield a positive percentage to indicate positive transfer and 
a negative percentage to indicate negative transfer. Several studies, 
however, have been concerned with indicating per cent inhibition. In 
the studies of Melton and Von Lackum (85) and of Ray (97), the per 
cent of inhibition is found by dividing the difference in scores of the con- 
trol and transfer groups by the control group score. It is evident that a 
simple reversal of signs has been made in this case, since a positive in- 
hibition means a negative transfer. 
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Many of the studies which utilize Formulas [1], [1a], [2], and [2a] to 
express improvement resulting from transfer report the measurement of 
the performance scores on both the initial and final tasks during only the 
initial stages of the learning, often by means of a ‘‘test’’ which samples 
the performance on the first trial or on the first few trials (3, 5, 7, 12, 22, 
23, 32, 33, 37, 40, 65, 69, 89, 97, 101, 130). In other studies, the values 
used represent the learning scores of both control group and experimen- 
tal group, having been derived from a measurement of learning carried 
well beyond the initial stage. This is the case when such measures are 
used as number of trials to reach a criterion, total or average errors, and 
total or average time on all trials (9, 16, 17, 18, 19, 20, 21, 57, 58, 64, 77, 
79, 85, 98, 109, 117, 118, 122, 131). In some cases in which learning has 
been carried beyond an initial stage, learning curves are presented 
which make possible a comparison of the improvement of both control 
group and transfer group at various stages in the learning. Melton and 
Von Lackum (85) compute the percent inhibition for each of five learn- 
ing trials. In this case, the numerator of the formula is the difference 
between the score of the transfer and control groups on each trial; the 
denominator is the score of the control group on the same trial. In the 
study of Sanderson (104), the time of the last trial of the initial test is 
employed as a measure of the initial performance (C Group Score), and 
the increase from that score of the average of the first three, the first 
five, the last three, and the last five trials of the final task is indicated as 
an increase in time and as percent improvement. 

Cumulative transfer obtained in successive tasks of the same type 
has been measured by Marx (79), Wiltbank (128), Bunch (18), and 
Higginson (57, 58). In these studies, percent improvement is computed 
in the usual manner, with the initial learning score as a base to which 
the measure of each succeeding learning is compared. Higginson (57, 
58) also computes percent improvement scores using as a denominator 
the score on each preceding task rather than the score on only the first 
task. Langer (75) expresses transfer as the ratio between the learning 
score obtained on one task and the score obtained on the preceding task, 
so that the expression represents ‘‘percentage of effort required for each 
subsequent learning in comparison with the antecedent learning.”’ 

Improvement has also been expressed by several investigators with 
the use of standard scores rather than percentages. As our previous 
discussion has indicated, such a measure is similar in its characteristics 
to that obtained when percent improvement is computed. Gates (43) 
expresses improvement in terms of multiples of the standard deviations 
of the distributions of ability in the initial tests. Mudge (87) uses a 
similar standard score expression to show the transfer between training 
in chemistry and a “functional” test in chemistry. Improvement is 
expressed in terms of the standard deviation on each of the two tests, 
and percent improvement as a ratio of the gain in the ‘‘functional”’ test 
to the gain in the training test. 
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Transfer Expressed as Per Cent of Improvement in a Given 
Number of Learning Trials 


A number of investigators have made use of Formulas [3] and [4] to 
yield a measure of transfer as the percent that improvement due to 
transfer is of the percent of improvement accomplished in a given 
number of direct learning trials. The numerator of these expressions is 
taken to be the difference between the initial test of the practice series 
and the initial test of the transfer series (25, 53, 68, 121). The denom- 
inator is a measure of improvement of performance in a given number of 
trials of direct practice. The result, as Katona states, provides a com- 
parison between the ‘‘transfer’’ effect and the ‘‘direct”’ effect. This 
measure of transfer is called the ‘‘transfer coefficient’? by Katona. 
Starch (114) obtained a measure identical with this by expressing the 
ratio between the percent improvement due to transfer and the percent 
improvement due to direct practice in a given number of trials. Al- 
though in this case a percent improvement measure is used in both 
numerator and denominator, the expression is one which reduces to 
Formula [3]. 


Transfer Expressed as Percent of Total Learning 


Formula [5], which provides an expression of the ratio between 
amount of transfer and the total improvement possible by learning, has 
not been used very frequently in studies of transfer. Cluley (22) and 
Overman (96) make use of this expression, or at least of an expression 
which may be shown to be equivalent. In these two investigations, 
foretests were used as a means of matching groups. The actual formula 
in which the values were substituted took this form: 


Aftertest Score — Foretest Score 





Total Possible Score — Foretest Score 


It may be seen that this formula is essentially equivalent to Formula 
[5], since Aftertest Score =7Z Group Score, and Foretest Score =C 
Group Score. It may be noted that the expression contains a term repre- 
senting total possible learning, which in these two instances was not 
directly measured but was assumed to be the total number of problems 
in arithmetic contained in the tests employed. Overman describes the 
resulting expression as ‘‘percentage of maximum possible transfer,” 
while Cluley refers to it as percentage gain, with the possible gain as the 
base. It will be noted that this method is comparable in many respects 
to the expression described in the preceding paragraph, which utilizes 
Formulas [3] or [4]. The essential difference between these two expres- 
sions is the use of the term representing total possible score in the de- 
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nominator of Formula [5], rather than score in a given number of trials 
in the denominator of Formula [3]. Thus, the amount of transfer is re- 
lated to the total improvement possible by direct practice rather than 
to the amount of improvement possible in a given number of direct 
practice trials. 

Formula [6], which also provides for the relation of transfer to total 
possible learning, has been used by a number of investigators in express- 
ing transfer as percent savings or percent inhibition. As has been 
pointed out in a previous section, when a measure such as number of 
trials to learn is utilized, it is usually justifiable to assume that if learn- 
ing were complete, a perfect performance could be given in zero trials 
(or in one trial). The term which represents this measure of total pos- 
sible learning has, however, ordinarily been omitted. The result is to 
make the expression resemble Formula [2] which yields a measure of 
percent improvement. In the studies referred to as 16, 17, 19, 20, 21, 
64, 79, 97, 118, and 122, percent savings is computed as a ratio of the 
number of trials saved in the transfer group to the number of trials re- 
quired for direct learning in the control group. In other studies (18, 54, 
57, 58, 109) the number of trials required in the after-test is expressed as 
a percentage of the number of trials required for learning in the foretest. 
it is evident that these expressions may be interpreted either as percent 
gain or as the percent which transfer is of the total possible learning. 
Some reasons for preferring the latter meaning have been advanced in a 
previous section. 


Coefficients of Correlation as Indicators of Transfer 


Several investigators have used coefficients of correlation between 
tasks before and after training to indicate the presence and direction of 
transfer. Such expressions are seldom depended upon exclusively, since 
most of the studies provide other means of measuring the transfer which 
has occurred. A study by Yamane (135), seen in abstract, reports 
coefficients of correlation between “‘intermediate”’ and ‘‘standard’”’ tasks 
in addition to a percentage measure of transfer involving a comparison 
of scores of the control and transfer groups. Winch (129) uses correla- 
tion measures to indicate the presence of transfer, together with percent 
improvement measures. Jackson (64) reports correlations between 
various amounts of pre-maze activity and number of trials to learn a 
maze, along with other measures of transfer. Mudge (87) computes 
correlation coefficients to show the extent to which gains in a training 
test and a ‘‘functional”’ test in chemistry tend to vary concomitantly. 
Mudge points out, however, that correlation is an unsatisfactory way of 
measuring amount of transfer, and consequently employs a percent 
improvement measure which has been mentioned previously. Alm (1) 
derives evidence of transfer from coefficients of correlation between 
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performances on mazes which are of similar pattern but geometrically 
opposite. Wesman (123) obtained measures of correlation between in- 
telligence test scores and achievement test scores at the beginning and 
end of a school year, and uses a measure of the significance of differences 
between these correlations as an indication of the existence of transfer. 


This author also reports the gains due to transfer in terms of raw score 
differences. 


SUMMARY 


The present article has presented a summary of the methods which 
have been used to give quantitative expression to measures of transfer 
of training. It has been pointed out that a number of factors in the 
various experimental situations which have been used to investigate 
transfer have exerted an influence upon the kind of expression which is 
given to transfer. Six relatively distinct methods of measuring transfer 
have been discovered in the literature, and some advantages and limita- 
tions of each have been described. Certain desirable criteria have been 
mentioned which transfer measures should possess for both theoretical 
and practical purposes. The extent to which each of the measures of 
transfer meets these criteria has been discussed. 

The findings of the present study are summarized in Table I. Col- 
umn 1 of this table lists the distinctive formulas which have been used 
to express amount of transfer. Succeeding columns show the theoretical 
limits of each formula, the values which the variables must have to 
indicate zero transfer, + and — 100 percent transfer, and the variables 
in each expression: which determine the value of transfer obtained from 
it. 

The theoretical limits of each formula indicate the extent of varia- 
tion which may be expected in the value obtained from each expression. 
An infinite limit, of course, is an impossibility in any study in which 
learning is measured in finite terms, but it indicates that unless a limit 
in amount of score possible can be obtained, there is no limit in the 
amount of transfer which might be computed. Actually, in each experi- 
ment the limit which transfer may have is determined by the highest 
score which has been obtained. It will be noted that in some cases there 
is a definite limit of 100 percent, minus 100 percent, or zero. No study 
can be expected to yield percentages above these values ; but it is evident 
that in the cases of Formulas [1], [2], and [2b], the upper limit is de- 
termined by the absolute value of the particular learning measures which 
have been employed. In contrast, Formulas [5] and [6] yield definite 
upper limits of 100 percent which correspond to the total possible range 
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in learning score. It will be noted that all of the measures summarized in 
Table I are in agreement as to the meaning of zero transfer. Zero trans- 
fer indicates that the control group after no training on an initial task 
is equal in performance to the transfer group which has had such previ- 
ous training. Formula [2b] indicates zero transfer by a value of 100 
percent (or 1.00), but the meaning of zero transfer is the same as with 
other measures. The wide variation in the meaning of 100 percent 
transfer in the various expressions may be noted in columns 3 and 4 of 
the table. , 

Two criteria which seem desirable for measures of transfer have been 
discussed. These are the comparability of the transfer expressions be- 
tween studies utilizing different learning tasks, and the comparability oi 
transfer measures at different stages of the learning of both the initial 
and final tasks. The extent to which each formula is capable of satisfy- 
ing these criteria in any particular study may be judged from the final 
column of Table I which summarizes the effect of the measured factors 
in each situation upon the values of transfer obtained. The values of 
transfer computed by Formulas [1], [1a], [2], [2a], and [2b] vary in 
accordance with the absolute size of the units in which the improvement 
is measured. The values obtained from Formulas [3] and [4] have a cer- 
tain usefulness in making possible a comparison of the amount of trans- 
fer after a given number of trials of initial learning relative to the 
amount to be accomplished by direct practice in the same number of 
trials. The values obtained from them, however, vary with the absolute 
value of the learned performance after a particular number of trials; 
consequently, they do not permit a comparison to be made between 
studies in which a given number of trials accomplishes different amounts 
of the total learning. The values of transfer obtained from Formulas [5] 
and [6] relate the transfer obtained at different stages in the learning 
process, and in different learning tasks, to the total improvement pos- 
sible in each case. Negative transfer computed by means of these 
formulas is also related to the total amount of positive improvement 
(transfer or learning) possible. The previous discussion has indicated 
our belief that Formulas [5] and [6] yield the most useful expressions of 
transfer for most purposes. 
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ANALYSIS OF VARIANCE—REPEATED MEASUREMENTS 


LEONARD S. KOGAN 
Institute of Welfare Research, Community Service Society of New York 


Previous articles in the Psychological Bulletin (1, 3, 4, 8) have dis- 
cussed many applications of analysis of variance techniques to areas of 
psychological and educational research. In these articles the problem 
arising when repeated measurements are taken on independent groups 
of subjects has been mentioned but no adequate method of analysis has 
been offered except by Alexander (1). A more recent article by Lind- 
quist (6) also treats of this problem among others of a similar nature. 

Since repetitive measurement is so common in experimental design, 
it is the purpose of the present paper to outline and discuss a somewhat 
different approach to the problem than Alexander has done. His excel- 
lently developed technique involves the assumption that the best esti- 
mate of population error is an estimate of the “... distribution of 
repetitions for a single individual at a single moment of the experiment”’ 
and further ‘“. . . that any pair of such distributions are independent 
or uncorrelated” (1, p. 535). This error estimate is derived from the 
individual deviations corrected for individual and group linearity. The 
basic assumption that the distributions are uncorrelated may be ques- 
tioned, however, and many researchers would feel happier if the estimate 
of population error took account of correlation between the successive 
arrays when present. The method to be discussed below, although it 
does not handle the complex situation of varying individual slopes as 
does Alexander’s procedure, attempts to take account of the presence 
of correlation between the successive measures. 

Lindquist deals with the problem of testing whether the successive 
means of two independent groups are “‘significantly different’ by work- 
ing with two sub-hypotheses (6, pp. 76-77). The first sub-hypothesis is 
that the difference between each of the corresponding successive means 
is the same and the second sub-hypothesis is that there is no difference 
between the general means of the two samples. His method represents 
a special ease of the present approach, which deals with n-groups, and 
will be incorporated in the discussion to follow. 

As anyone who has worked with this type of problem will testify, 
the solution is far from being a matter of mere mechanics and this paper 
has been written chiefly to emphasize the non-routine nature of the 
problem and to inspire further thought and discussion. 
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Types or StupIEs INVOLVING REPEATED MEASUREMENTS 
Some of the types of investigation in which the problem of repeated 
measurements arises are: 


1. Examination of learning curves of various experimental groups on the 
same task. 

2. Investigation of successive ratings (e.g., inspectors’ ratings of flight stu- 
dents at stages of training) for groups trained by various methods. 

3. Analysis of sequential measures of physiological or sensory-motor func- 
tions for varying treatment groups. 


In general, the situation presents itself whenever over-all significances 


TABLE I 


SKELETON OUTLINE OF DATA FOR A PROBLEM INVOLVING 
SEVERAL GROUPS WITH REPETITIVE MEASURES 
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of differences of group means derived from a time sequence are being 
investigated. 

A skeleton example of such a problem is presented in Table I where 
five groups of subjects have been trained by different methods and 
measures have been obtained in 13 successive trials. Two major sources 
of variation in the scores can at once be postulated.! These are (1) a 
variation associated with time or trials usually designated as “learning,” 
(2) a variation related to method of training. The number of scores for 
each subject is identical but there may be different numbers of subjects 
in each methods group. Analysis is facilitated, of course, when there are 
equal numbers of subjects in the several methods groups. 


A METHOD oF ANALYSIS OF VARIANCE FOR 
REPEATED MEASUREMENTS 


The standard three-way analysis of variance does not apply in this 
case because any given individual’s scores appear in only one of the 
methods groups. What is presented is rather a two-factor analysis, com- 
plicated by a within-among classes subdivision. One would, moreover, 
have to justify the use of the same error estimate as the basis of an 
F-test, say for comparison of successive mean trial scores (learning) and 
for the means of methods. In an F-test for successive measures one 
should provide for the possibility of correlation between the measures, 
while in applying the F-test for methods the values can be considered 
independent since they are obtained from different subjects. 

Let us consider some of the variance estimates which can be derived 
from the data. Three sources of population variance estimate are avail- 
able if the null hypotheses hold. These are (1) a variance based on means 
of methods (M), (2) a variance based on means of successive trials (T) 
and (3) a variance based on M/T interaction. Since a mean score can 
also be obtained for each subject (viz., 138 means) it is also possible to 
make a variance estimate from these values which might be called an 
inter-subject variance estimate. If the sums of squares and degrees of 
freedom whereby these four variance estimates are derived are removed 
from the total sum squares and degrees of freedom, the remaining term 
may be designated as a residual intra-subject estimate of variance when 
divided by its degrees of freedom. 

The equation from which the above variance estimates are derived 
is as follows (without multiple summation signs or constant multipliers) : 

It is assumed that the means of the initial or pre-experimental measures are com- 
parable for the groups so that covariance adjustments are not obligatory. This initial 


comparability of group means and variances is usually a result of randomization or con- 
trolled distribution of subjects. 
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D (ysur — 9)? = DL Gu — 9)? + X Gr — 9)? + DX (Gs — 5)? 
+ 2) Gur — Iu — Fr + 9)? 
+ DY (ysur — 9s — Jur + 9)? [1] 


where ysur =any single score 


y=grand mean 

4 =a subgroup mean 

S=subject, M=method, T =trial 

n =number subjects; #4 =number methods 
nr =number trials 

N=total number of measures 


The analysis of variance setup from equation [1] is as below: 


Source of Var. 

Variation 3S df Gf problem Est. 
Methods — ny —1 4 — 
Trials — np —1 12 oe 
M XT Interaction —_— (my —1) (mr —1) 48 — 
Inter-Subject ~~ n—1 137 — 
Residual — nr(n —ny) —(n —1) 1592 = 

Total -- N-1 1793 


The problem, obviously, resolves itself into the choice of valid error 
variances with which to compare the variance estimates made from 
methods, trials, and M/T interaction. 

First, let us consider the M/T interaction itself as a possible estimate 
of population error. Such a procedure would be based on the assumption 
that this interaction provides a fair estimate of chance variation since 
it is the remainder after possible effects due to methods and “learning” 
have been removed. Inasmuch as interaction variance is directly related 
to the average intercorrelation among columns, its use as an estimate of 
error is somewhat equivalent to the classical technique for comparing 
means based on correlated scores. Here the formula for the standard 
error of the difference between matched groups or paired sets of means is 
generalized and a common variance is estimated from the group trial 
means (5). In this case the data for columns (trials) are not merely 
matched but are measures obtained from the same subjects and hence 
the inclusion of the correlation term in testing successive means is ob- 
ligatory. However, one might question the use of this interaction vari- 
ance as error in testing the significance of methods differences because 
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the methods groups are composed of different subjects and no correction 
for correlation should be made. 

In any event, if the interaction term were used as an error estimate 
it would seem that we are discarding in large degree the possibility of 
making an error estimate from the replications within M/T cells. Ordi- 
narily, one might tend to make use of the replications as a basis for an 
error estimate when replication is part of the experimental design and 
the separate cases are felt to be comparable. 

The question of just how to obtain an error estimate from the repli- 
cations is difficult to settle. If we subtract sums of squares and degrees 
of freedom from the total for trials, methods, and MT interaction just 
what is left? It is true that an error estimate based on this type of 
residual might be considered to be the average variance within M/T 
cells and thus free from the possible effects of methods and “‘learning.”’ 
If scores for the same individuals were not present across the rows this 
would probably be a good estimate of error. But this “within M/T”’ 
variance is divisible into a portion which might be called an inter- 
subject variance and a remainder which might be called an intra-subject 
variance. The inter-subject variance was obtained from the variation 
of the individual subject means around the grand mean. Should this 
inter-subject variance be the error estimate to use, say, for comparison 
with the variance estimate based on methods means? Offhand, one 
might regard this inter-subject variance as a valid estimate of chance 
variation. However, one consideration must be taken as strong negative 
evidence against doing this. This is the fact that any possible real effects 
of varying methods on the learning measures will also be present in this 
inter-subject variance, inasmuch as both estimates are obtained from 
the same axis of classification. Thus if real methods differences exist 
they would be reflected in the inter-subject variance and merely assum- 
ing the null hypothesis with respect to methods groups would not alter 
the facts. 

Similarly, the residual term remaining after methods, trials, MXT 
interaction, and inter-subject variances have been partialled out cannot 
be considered a valid estimate of error. This is so because there is the 
likelihood of having subtracted the methods sums squares twice from 
the total—the first time directly, and the second time as part and parcel 
of the so-called inter-subject variance. ; 

Most of the difficulties inherent in the above equation would have 
been apparent if the basic equation had been derived from fundamental 
elements. It would then have been obvious that the first and third terms 
of the right-hand members are not independent. However, many psy- 
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chologists are not sufficiently mathematically sophisticated to perform 
this derivation and very often they will be satisfied if the two sides of the 
equation are equivalent. 

A more adequate breakdown of the total sums of squares would 
involve not taking the deviations of the individual subject means from 
the grand mean but from their own methods means. If this is done it 
means replacing the term > (9s—9)? by }.(9s— ar)? and adjusting the 
residual term. This alternative type of inter-subject variance precludes 
the possibility of subtracting the methods sums squares twice from the 
total. If, on one hand, the methods means do not differ significantly 
from each other, the analysis is for all practical purposes the same as 
before, but if, on the other hand, the methods means do differ signifi- 
cantly, these real differences will not be involved in the sums of squares 
based on the individual subject means. 

The equation for this alternative breakdown is as follows; 


d (ysur — 5)? = DY (Gu — 9)? + Ds Gr — 9)* + D Gs — Fm)? 
+ = (fur — 9m — 97 + y)? 
+ > (ysmr — Jur — 9s + Fm)’. [2] 


The computational outline and the analysis of variance from the 
above equation is as follows: 


(1) Correction = C = (>> y)?/N 

(2) Total SS = }> y?-—C 

(3) SSu = >> (> yu)?/nsur — C 

(4) SSr = > a (> yr)?/n —C 
MT M 

(5) SSur = >. (> yur)?/nsu — >> (> yu)?/nsur 
— (2 yr)*/n + C 

( 8 M 

(6) SSs = > (3B ys)?/nr — (DD yu)?/nsur 

(7) SS; = (2) — (3) — (4) — (S) — (6) 

(8) SSc = (6) + (7) 


where N=total number of measures (=mmnr) 
n =number of subjects 
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nr =number of trials per subject 

nsu =number of subjects in a methods group 

Nsur =number of subjects in a methods group x number of trials 
mu =number of methods 


The analysis of variance scheme for equation (2) is as follows: 


Source of » Var. 
Variation + af Gforovien Fess 
Methods (M) (3) nyu —1 4 Vu 
Trials (T) (4) nr —1 12 Vr 
M%XT Interaction (MT) (5) (mu —1) (mr —1) 48 V ur 
Composite error (C) (8) nr(n —nm) 1729 Ve 
Inter-subject (.S) (6) n—ny 133 Vs 
Intra-subject (J) (7) (n —ny) (mr —1) 1596 Vy 
Total (2) N-1 1793 


If we now generalize Lindquist’s tests for two independent groups 
(6, pp. 76-77) to several groups we would use two F-ratios in determin- 
ing whether the successive population means coincide for each trial. 
First we would test F= Vyr/ V7 in order to determine whether the dif- 
ferences between the corresponding trial means for the various methods 
group are the same. Secondly we would use F= Vy/ Vz to test whether 
the differences between the general methods means are not significantly 
different from zero. Now if either sub-hypothesis must be rejected, then 
of course the general hypothesis that the successive means coincide for 
each trial must be rejected also. 

This is as far as Lindquist goes in handling this type of problem but 
it is apparent that further tests can be made with useful results. Let 
us deal with the tests involving Vy first. In addition to getting F= 
Vu/ Vs let us also obtain F= Vy/V;. If both of these F-tests turn out 
to be significant the interpretation is straightforward that neither the 
variation within nor between subjects is sufficient to account for the 
variation of the method means. This would increase our confidence in 
concluding that some of the general means are “‘significantly different”’ 
from others. If Vy/Vz;, only, is significant the interpretation would 
probably be that inter-subject variance is the only significant source of 
methods variation. If Vy/Vs but not Vy/Vr were significant one 
would infer that intra-subject variance is the only significant source of 
methods variation. As will be pointed out below, such a situation might 
occur if there were little or no correlation between successive trials. If 
neither F-ratio is significant one would infer that the methods variation 
is attributable to these other sources and reject the hypothesis that 
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there are significant differences between the general methods means. 

With regard to the M/T interaction variance one would likewise 
probably wish to test it against both the intra-subject and inter-subject 
variances in order to render one’s conclusions more precise. One might 
also argue for testing it against what we have called the ‘“‘composite 
error’ term (based on pooling inter-subjects and intra-subjects sums 
squares) since it is affected both by differences between trials and dif- 
ferences between methods. If it tests significantly (say P<.01) it 
means that there has been a differential effect among the methods with 
respect to the successive trials. In other words, the learning curves for 
the different methods show differences which may mean that learning 
has been faster for some of the methods. 

The remainder of the discussion on testing the methods variance 
estimate is a tentative analytic procedure which attempts to perform 
the above tests in such a way that the dependence of the conclusions on 
the presence or absence of “‘significant’’ correlation between successive 
measures is made more apparent. 

Let us examine the term )-($s— 4)? in the equation. It is a term 
which is closely related to the correlation between trials. If the succes- 
sive measures for each subject within a methods group are totally 
independent one could use this term to estimate os”; such an estimate 
would tend to approach a minimum if no inter-trial correlation were 
present since the subject means would tend to be equal to the methods 
mean. If, however, the successive measures are correlated this term 
becomes larger since the individual subject means would then tend to 
form a distribution around the methods mean. Inthe case of perfect 
positive correlation between successive trials this term would be 
maximized while the residual intra-subject variance would be minimized 
since it is the only other term free to vary. Thus, as one would expect, 
the intra-individual variance would tend to decrease with increasing 
inter-dependence of successive scores. This intra-individual variance is 
closely allied to the residual usually obtained in a two-way analysis 
where column and row sums squares are subtracted from total sums 
squares—in this case, the residual has been generalized over all the 
methods (5). 

Since the inter-subject variance and the intra-subject variance are 
so closely related to the correlation between trials, if we use F= Vs/ Vr 
we have a fair basis for stating whether or not the inter-trials correlation 
is significant.? 

2 In the appendix a formula for estimating the average correlation between columns 
(trials) is provided and the reason for this statement becomes obvious. 
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If the correlation is concluded to be “not significant’’ we may use 
the “‘composite error’’ term with reasonable confidence as a population 
error estimate for testing the methods variance (F = Vy/Vc). The com- 
posite error term could be regarded as the result of ‘‘pooling’’ not sig- 
nificantly different sums squares and df's. This ‘‘pooling’’ could further 
be defended because the successive measures can be regarded as inde- 
pendent if the correlation is not significant and we can treat the meas- 
ures as if they had been obtained from different subjects. 

However, if the correlation is found to be significant (say P<.01) 
it is more difficult to decide what to use as the error estimate. We would 
certainly not be justified in ‘‘pooling’’ the intra-subject and inter-sub- 
ject sums square here into a ‘‘composite error’’ term since its compo- 
nents would be significantly different and also such an estimate would be 
largely weighted with a small intra-subject variance with too many 
degrees of freedom.’ Therefore, the more appropriate term to use as the 
denominator of the F-ratio when the correlation is significant is the 
inter-subject variance based as it is on the deviations of subject means 
from their respective methods means. This latter F= Vy/Vsz is the 
generalization of Lindquist’s test for the methods difference. 

Let us suppose now that the F-ratio of the methods variance and the 
proper “error’’ estimate falls below a P-level of .01. This would indicate 
that at least some of the methods have produced significantly differen- 
tiable effects. If one of the groups is a control group we may be inter- 
ested in comparing this group with some of the other groups. This may 
be done either by comparison of the individual method means with 
t-tests based on the error variance as derived from the analysis of vari- 
ance (4, p. 133) or by redoing analyses of variance using only the groups 
in which we are specifically interested. If mean learning curves for the 
different groups are drawn on the same graph it is often possible to pick 
out the group comparisons which are most pertinent. Similarly, if we 
are interested in a more precise investigation of the mean curves for each 
methods group we may wish to perform a standard two-way analysis of 
variance for each group.‘ While the two-way analysis may not be so 


* It is this matter of degrees of freedom which seems to be the chief problem here. The 
number of degrees of freedom to be associated with the ‘‘composite error’’ estimate should 
logically be lowered if there is “significant” correlation between trials. We have toyed 
with the formula: 

Df composite error = df inter-sudject +*/ 1—F Gf intze—eubject 


This would relate the degrees of freedom lost because of inter-trials correlation to the 
standard error of measurement formula but we have not been able to derive any such 
equation in an orderly fashion. 

* Alexander’s more general tests for trend would be relevant here (1). 
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valuable for learning problems, where we are usually confident that 
significant changes occur with time, it would be relevant to situations 
such as the investigation of measures of drive where cyclical variations 
are often encountered. The chief value of the type of analysis presented 
in this paper is for investigating over-all differences among groups treated 
differently. It is likewise in no sense intended to serve as an easy sub- 
stitute for the important necessity of studying the curves for individuals 
and groups. 

In applying an analysis of variance along the lines of that discussed 
above one major point must always be considered: whether or not the 
data for the individuals and the varying groups can be assumed to have 
been drawn from a homogeneous, normal parent population. Because 
of the nature of the problem it is obvious that the successive trials of 
the same subjects are not likely to be independent and thus one must 
always make a practical decision as to whether ar over-all analysis of 
variance can be justified at all. Statistical tests are available for testing 
normality and homogeneity of variance and in general if the ranges of 
values do not shift markedly from trial to trial an analysis of variance 
may lead to relevant hypotheses about the data. In learning experi- 
ments in particular, however, it is quite common for the variance 
to show a gradual constriction over a period of observation. If this con- 
striction of variance is apparent or especially if the successive variances 
tend to be proportional to the successive means certain transformations 
can often be made from the ‘raw scores’’ such as the substitution of 
logarithms or ranks for obtained scores. Likewise if the data are ex- 
pressed in terms of percentages, a transformation to angles is often 
advisable. Situations of this type and the resulting loss of efficiency 
when transformations are made are discussed by Snedecor (10) and 
Smith and Duncan (9). 


SUMMARY 


The difficulties inherent in performing an over-all analysis of vari- 
ance when repeated measurements are taken on several independent 
groups are discussed. The general aim of such an analysis is to decide 
whether to reject the null hypothesis that several ‘‘treatment’’ or 
“methods” groups exhibit no differences in general trend. Suggestions 
on the use and limitations of several approaches to the problem are 
made. 
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APPENDIX 
A pplication of Analysis of Variance Sums Squares or Variance Estimates to 
Obtain Inter-Trial (intercolumnar) Correlation. In order to estimate an average 
inter-trial correlation one should restrict his attention to one methods group 
at a time. If the values over the entire columns of Table I are used as a basis 
for this correlation estimation a spuriously high r may be found due to actual 


methods differences. 


A tabulation of the scores within one methods group would appear as in 




















Table II. 
TABLE II 
SKELETON OUTLINE OF DATA FOR SINGLE METHODS GROUP 
Trials 
Individuals 1 2 3 see nr 

1 M1 Ys Yar Is1 
One 2 
Methods 
Group 

La 

571 





An expression (f,) of the average inter-column correlation as demonstrated 
by Peters and Van Voorhis (5, pp. 196-201) is as follows: 
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a(om?/o.2) — 1 
. a-—1 
where a=number of columns (trials) 


Om? = variance of means of rows (subjects) 
o,?=average within-columns variance 


In terms of an analysis of variance of the above Table II: 


m >, (dys)? — (Dy)? 
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SScncne subjects — 





SSwithin trials = 


From these expressions for sums squares it is easy to show that: 


nr(S S. among subjects) 





—1 
SS within trials 





“i 
=] 
lI 


nmr —1 


When there are several methods groups (as in the example discussed in the 
body of this paper) we may obtain an over-all 77 by using weighted z’s for the 
77's of each group or we can find an unweighted mean for SSgmong subjects from 
SSg/ny and likewise a mean for SSwithin trials if we use SSc/my. Thus to esti- 
mate an over-all correlation among successive trials 








nr(SSs/nm) 
= SSco/nm nr(SSs/SSc) ~% 
T = = . 
. nr — 1 a7— 1 


From this equation the formula for ¥7 can be derived in terms of variance 
estimates: 


os nr(SSs/SSce) — 1 be nr(SS¢e/SS¢ — SS1/SSce) — 1 
" nr — 1 se nr — 1 

nr(1 — SS:/SSe) — 1 7 nr(1 — dfrV1r/dfcVc) — 1 
a nr — 1 a nr — 1 


nr — n7(dfr/dfc)(Vr/ Vc) om 1 if “nT 1 a nr(dfr/dfc)(V1/Vc) 


nr — 1 nmr — 1 nr — 1 
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nr nr(n7 —1)(n— ny) Vr 
i-- dfr/d Vr/Ve) = 1-— — 
ants (df;/dfc)(V1/Vc) = 1 Ses: ensditlaion tanh Sa 


=1-V;/Vc5 


5 Peters (6, p. 293) shows that S*,,=(1—r,)S?2 where 
Sw? = population variance estimate from three part analysis (rows, columns, inter- 
action) 
S?=population variance estimate from two part analysis (within and among 
columns) 

Thus fr is merely this relationship generalized over all the methods. 

Alexander (2, p. 97) further demonstrates that such an average product moment r 
is equivalent to the intra-class correlation adjusted for trend. His article dealing with 
various estimates of reliability when several trials are available is a major contribution to 
the topic of repeated measurements. 

















PERSONNEL RESEARCH AND TEST DEVELOPMENT 
IN THE BUREAU OF NAVAL PERSONNEL 


A SPECIAL REVIEW* 


JOHN C. FLANAGAN 
University of Pittsburgh 


Many psychologists participated during the past war in programs 
involving research and development on personnel problems of the 
military services. The experience obtained in these programs was no- 
table for several reasons. Psychologists with diverse backgrounds were 
brought together to work on common problems. This made possible a 
cross-fertilization of methods and information which, in many cases, 
yielded valuable outcomes. Another important characteristic of the 
situation was the stress on practical problems and tangible results. An 
important advantage consisted of the large number of cases available 
for study and the unusual degree of control which could be exercised 
with respect to the individuals involved. A report of the work of one of 
these groups is therefore of considerable general interest, and the editor 
and contributors of this book are to be commended for presenting this 
material. The Preface states that ‘“The purpose of this book is to present 
an evaluative summary of the personnel research and test development 
completed during World War II by the Test and Research Section of 
the Bureau of Naval Personnel in cooperation with the National De- 
fense Research Committee Project N-106 and the College Entrance 
Examination Board”’ (p. ix). 

The Test and Research Section was one of the last personnel research 
groups to be organized in the services. It did not begin functioning until 
November, 1942, almost a full year after the United States entered the 
war. Because of this situation there are frequent references throughout 
the book to work planned but not carried through because of the termi- 
nation of hostilities. There is also considerable emphasis on ‘‘the lack of 
a carefully planned, systematic approach to the study of the Navy’s 
personnel problems before and during World War II” (p. 454). Another 
consequence of this late start was that the group was unable to partici- 
pate in the development of the special personnel selection, classification, 
and training programs for the emergency. The decisions with respect 


* Sruit, Dewey B. (Ed.) Personnel research and test development in the Bureau of 
Naval Personnel. Princeton: Princeton Univ. Press, 1947. Pp. xii+513. 
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to most of these problems had been made and the programs established 
before the psychologists arrived. This is emphasized in the concluding 
chapter by the statement, ‘‘What personnel research was accomplished 
during the war on the problems of training and of selection and classifi- 
cation was performed under the hampering restriction that it should 
interfere as little as possible with the personnel procedures already in 
use and the routines already established”’ (p. 450). A further source of 
difficulty referred to in this report is the lack of control of the actual test 
administration and personnel classification procedures as used in the 
training and operating units. 

The book is divided into five parts, consisting of four or five chap- 
ters each. The individual chapters have been written by various mem- 
bers of the staffs of the Test and Research Section and cooperating 
agencies. Ordinarily the authors are those who were most active in 
carrying out the projects described. The first section is historical and 
descriptive. The second section is related to the selection and classifica- 
tion tests. The third reports projects on the prediction of success in 
training. The fourth describes the achievement measures developed, 
and the final section discusses follow-up studies and surveys. The ma- 
terials in the various parts will be discussed briefly in the paragraphs 
which follow. 


Part I: The Navy's Selection, Classification, and Training Program 


The authors of the chapters in this section are Ray N. Faulkner, 
Helen R. Haggerty, John H. Cornehlsen, Charles E. Odell, Eugene D. 
Carstater, and Howard T. Batchelder. This section describes the status 
of testing prior to the development of the Test and Research Section and 
describes briefly the various selection, classification, and training prob- 
lems with which the Navy was confronted. The task of selecting, classi- 
fying, and training 300,000 officers and 4,000,000 enlisted men during 
the period of the war presented a large and complex problem. The pro- 
cedures developed by the Test and Research Section to assist in solving 
these problems are discussed in the four later parts of the book. 


Part II: The Construction, Standardization, and Use 
of Selection and Classification Tests 


The authors of the chapters in this section are Guy L. Bond, Joseph 
Miller, William A. Owens, Dewey B. Stuit, Daniel D. Feder, and Milton 
Wexler. The classification tests used in the basic test batteries for en- 
listed and officer personnel follow rather standard forms. The verbal 
tests include opposites, sentence completion, and analogies. The reading 
materials include measures of the ability to note details, to draw infer- 
ences, and to follow directions, using materials related to Navy life. 
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The arithmetical reasoning tests were of the conventional sort. The 
measures of mechanical aptitude in the enlisted men’s test included 
tests of block counting, comprehension of the principles involved in 
mechanical situations, presented in simple diagrams and pictures, and 
surface development items requiring the transformation from a two- 
dimensional to a three-dimensional figure shown in perspective. The 
officer qualification test had only the second of these types of mechanical 
comprehension items. The officer classification test presented, in 
addition, two spatial tests, a block assembly test, and a test requiring 
the rotation of solid figures. Other special functions tested in the basic 
test battery for enlisted men included mechanical knowledge (electri- 
cal), mechanical knowledge (mechanical), clerical aptitude (alphabetiz- 
ing, name checking, and number checking), spelling, and a radio code 
test. 

The reliabilities and intercorrelations of the various sub-tests are 
reported. It is unfortunate that although speed is a substantial factor in 
many of these tests, many of the reliability coefficients reported are 
computed from odd-even scores or by the Kuder-Richardson formulas. 
The only discussion of this fact which came to this reviewer’s attention 
was the following statement on page 70: “The alternate-form reliabili- 
ties of all tests are lower than the odd-even estimates. This result is to 
be expected in view of the fact that a test typically correlates with itself 
higher than with another test.’’ It is doubtful whether there is any value 
in computing Kuder-Richardson reliability coefficients on such a speed 
test as the clerical aptitude test. Certainly, where this is done, the un- 
sophisticated reader should be warned that the coefficient has very 
little practical value. 

Certain special aptitude tests were developed for predicting success 
in radar training. These included such tests as polar-grid coordinates, 
scale reading, and relative movements. 

The chapter on ‘‘Measures of Personal Adjustment” by Milton 
Wexler is of special interest. Although the problems of measuring 
personal adjustment were certainly not solved during the war, the ex- 
tensive experience in this field laid in the groundwork for a very 
productive period of research in the next few years. On pp. 140 and 
141, Mr. Wexler presents an excellent statement of some of the pitfalls 
inherent in research in personal adjustment. These include the fallacy 
of comparing the scores of individuals tested in hospital wards with 
those of recruits, the contamination of the criterion by knowledge of the 
test results, and the application of a scoring key to the same population 
on which it was developed. Suggestions for adequate research design are 
given. It is concluded that most of the types of items and formats in- 
cluded in the various personal adjustment inventories studied predict 
psychiatrists’ judgments about equally well. Predictions are not greatly 
improved by either the extensive multiplication of items or the inclusion 
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of filler questions. Many of the most significant items may be classified 
under the heading of ‘‘conversion symptoms.’’ These include headaches, 
nervousness, dizzy spells, fainting, tachycardia, stomach pains, physical 
debility, and similar psychosomatic complaints. The author urges more 


thorough research, using a criterion based on actual success in adjust- 
ment. 


Part III: Prediction of Success in Training 


The authors of the chapters in this section are Herbert S. Conrad, 
Gerald V. Lannholm, James W. Maucker, Royal F. Bloom, Everett G. 
Brundage, and James F. Curtis. This section includes numerous corre- 
lation tables comparing scores on classification tests with grades in 
training schools. In general, the various selection tests were found to 
yield validity correlations between .40 and .60. In very few instances 
did information on age, education, or civilian experience add to the 
effectiveness of the predictions from aptitude test scores. 

The field personnel responsible for classification for training purposes 
made recommendations in the classification interview which were re- 
corded in four categories. The validity of these recommendations was 
studied for a sample of nearly 38,000 trainees. In spite of the fact that 
the classification interviewers had all of the test scores in front of them 
at the time they made their recommendations, the validity coefficients 
obtained for these recommendations were substantially lower than 
those obtained from the test scores alone. It must be concluded that in 
this situation interviewing actually detracted from the success of the 
process, and better results would have been achieved if assignment to a 
service school had been based on test scores alone. 

After summarizing the results of the various prediction studies the 
authors conclude that it would be very desirable to have more valida- 
tion work done using actual performance of assigned duties, rather than 
course grades. Efforts to improve the adequacy of course grades as 
measures of proficiency are reported in the next section. 


Part IV: The Construction and Use of Achievement Measures 


The authors of the chapters in this section are David G. Ryans, 
Rutherford B. Porter, Charles M. Harsh, Eugene D. Carstater, Daniel 
D. Feder, William R. Lawrence, Ruth M. Cruikshank, and Wesley C. 
Darling. The activities reported in this section were related to the prep- 
iation of achievement examinations (including both written exami- 
nations and performance tests), the improvement of the content and 
quality of instruction, and the accuracy of marking and grading. One of 
the methods used in relation to these objectives was the preparation and 
distribution to Navy instructors of a bulletin entitled, Constructing and 
Using Achievement Tests. This bulletin was very competently done and 
appears to have been effective. The written examinations developed 
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within the section are generally of the multiple choice type and make 
extensive use of pictures and diagrams. 

In order to evaluate performance in hand tool or machine tool shop 
work, various gauges were developed to measure the size, squareness, or 
symmetry of a particular product. The scores were ordinarily inscribed 
on the gauge, penalizing the student objectively in proportion to the 
extent of his error. Check lists were prepared for scoring many perform- 
ance tests. These include (1) directions to the examiner, (2) a list of 
necessary tools, (3) exact directions to the trainee, and (4) a list of all 
items to be scored. The check list for scoring includes those aspects of 
each part of the task considered appropriate, such as accuracy, sequence, 
tool use, and speed. All items are objective and are scored by merely 
checking the list. 

A major project undertaken but not completed during the war was 
the development of a series of examinations for advancement in rating. 
Most of the examinations were written subject matter tests, but plans 
were also laid to construct performance tests covering the practical 
factors. The development of separate rating examinations for each pay 
grade in fifty ratings was made even more difficult because of the variety 
of equipment found aboard different types of ships and the varied 
nature of Navy jobs bearing the same name. This work with achieve- 
ment examinations emphasized the need for more accurate definition 
and evaluation of training results. It was found that the nature of the 
aptitude tests predictive of course grades depended on the evaluation 
procedures used in obtaining these grades. If written examinations 
stressing identification of parts and other academic information were 
used to obtain course grades, one set of aptitude tests was found to be 
the best predictor. On the other hand, if the course grades were based 
mainly on practical performance tests, very different aptitudes were 
indicated as the important requirements. 


Part V: Follow-Up Studies of Training and 
Classification Techniques 


The authors of the chapters in this section are Harold P. Bechtoldt, 
James W. Maucker, Dewey B. Stuit, C. Robert Pace, Norman Frederik- 
sen, and Eugene D. Carstater. Because of the central importance of 
criterion and follow-up studies in all personnel research, this section 
will have the greatest general interest to personnel psychologists. The 
first chapter is on ‘‘Problems in Establishing Criterion Measures,” by 
Harold Bechtoldt. It includes suggestions for developing an acceptable 
criterion. These are discussed under the headings of definition of the 
areas of performance, reducing the effect of extraneous variables, in- 
creasing the accuracy of measurement, increasing the comparability of 
criterion values, and combining criterion measures. Although the author 
does not pretend to answer all the problems relating to the development 
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of a suitable criterion, this discussion is very much worth reading for 
anyone planning to do research on personnel problems. 

The discussion is quite comprehensive but there are two factors 
which, while not overlooked, seem to this reviewer to be given inade- 
quate emphasis. The first of these is the fundamental importance of 
systematic, directed, and recorded observations as a basis for making 
judgments in the form of ratings. The second is the distinction between 
what an individual can do and what he actually does do. 

The following chapter is also a ‘‘must’’ for anyone working in this 
field. Written by Harold Bechtoldt, James Maucker, and Dewey Stuit, 
it discusses prediction of performance of enlisted personnel aboard ship. 
It is almost exclusively concerned with a follow-up study of about 2,000 
enlisted men in six different types of duty assignments on 27 ships. 
After trying and rejecting ratings scales and an adaptation of the nom- 
inating technique, a method based on the order of merit rankings made 
by supervising petty officers was adopted. The account of the many 
problems involved in the analysis and use of these criterion data pro- 
vides an excellent example of the many difficulties, complexities, and 
unsolved problems in this general area. With a few exceptions, fairly 
good agreement between the ranks assigned the same group of individ- 
uals by different petty officers was obtained. It was found, however, 
that little differentiation could be made by these supervisors with re- 
spect to petty officer qualities and technical competence. It was also found 
that there is a very substantial correlation between months of experience 
and the ratings. Other variables, notably amounts of civilian education 
and Navy school training, were found to be correlated with the criterion 
rankings. By rejecting certain cases and partialling out the effect of 
experience, adjusted criterion scores were obtained which were com- 
pared with the six selection test scores. All but one of the coefficients 
obtained were positive, and about half of the values exceeded the 
amount required for statistical significance at the one per cent level. 
The median correlation coefficient was about .25. 

The many variables considered in interpreting these results included 
the extent of knowledge of the selection test results by these petty 
officers and the effect of this knowledge—which was believed to be 
small—on the ranks they assigned; the extent to which knowledge of 
amount of civilian and Navy training influenced the rankings; and the 
possibility of a correlation between selection tests and amount of ex- 
perience as a result of a trend in the quality of individuals attending 
Navy schools. An additional factor, which is not discussed in this 
chapter, is the extent to which the judgments of supervising petty 
officers were really independent judgments. In many situations such 
ratings are found to have been based on reputation. The individual’s 
reputation is frequently derived from a pooling in informal conversation 
of relevant and irrelevant bits of information concerning him. 
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The authors are certainly to be commended for carrying out this 
project and reporting it in considerable detail. Certainly much more 
work of this type is essential if progress is to be made in personnel re- 
search. 

The remaining chapters include one on ‘Information Surveys as 
Evaluative Devices’’ by Robert Pace, and a chapter on ‘‘Problems for 
Further Study’”’ by Norman Frederiksen, Eugene Carstater, and Dewey 
Stuit. The chapter on surveys includes an excellent discussion of 
factors influencing the value and usefulness of opinion research. The 
last chapter discusses unsolved problems in connection with techniques 
of selection, classification, and training, and various methodological 
issues. The problems defined and the considered opinions of these in- 
dividuals with respect to promising research are well worth reading. 


In conclusion, it can be stated that much evidence is provided by 
this book of the great value of such a personnel research program. Fur- 
ther tangible evidence of the effectiveness of the work of this group is 
provided in the decision of the Navy to continue a program of personnel 
research and test development during peacetime. This book constitutes 
a valuable reference for all those interested in personnel research. 











A NOTE ON POSTMAN’S REVIEW OF THE 
LITERATURE ON THE LAW OF EFFECT! 


G. RAYMOND STONE 
University of Oklahoma 


It is too much to expect that a critical review of the extensive and 
controversial literature on the law of effect, no matter how excellently 
done, could itself escape some disagreement. The present writer presents 
this consideration as an excuse for taking exception to certain areas of 
Postman’s recent article (16), in particular to his treatment of the nega- 
tive aspect of the law of effect. The writer has had occasion to review 
this literature (19), and, like Thorndike (25), Hilgard and Marquis (3), 
McGeoch (9), and Postman himself (pp. 506-507), had reached the 
reasonable conclusion that more experimental work was needed before 
generalizations with respect to the influence of punishment became safe. 
However, the implied direction that generalizations would take is 
considered to be somewhat different from Postman’s, and is more like 
the point of view of Thorndike. 

The disagreement can be restricted here to three topics: (1) Post- 
man’s interpretation of Thorndike’s conclusion with respect to the 
general influence of punishment in learning; (2) the specific action of 
punishment upon the strength relations of single stimulus-response 
connections to which it is applied; and (3) the spread effect of negative 
incentives in serial multiple-choice learning. 

1. Postman’s interpretation of Thorndike’s conclusion with respect 
to the general influence of punishment in learning. Although it is admit- 
tedly not an ideal technique to take specific quotations out of context, 
the essential issue can be made clear by alternately quoting from Post- 
man’s article (16) and from Thorndike in his book The Psychology of 
Wants, Interests and Attitudes (25). 


Postman: Thorndike no longer con- Thorndike: An annoying after-effect 
siders punishment as an effective agent does weaken the tendency which pro- 
for the elimination of wrong responses ducegit... when it does so, its method 
(p. 502). of action is often, perhaps always, in- 
direct. That is, the person or animal 
is led by the annoying after-effect to 
do something else to the situation 
which makes him later less likely to 
follow the original connection (p. 71). 


* PostMAN, Leo. The history and present status of the law of effect. THis JOURNAL, 
1947, 44, 489-563. 
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Postman: Failing to find a significant 
weakening of connections following the 
announcement of Wrong, Thorndike 
denied the efficacy of punishment in 
general (p. 503). 


Postman: Does punishment have any sig- 
nificant effect on learning? We have al- 
ready seen that Thorndike answers 
this question in the negative... (p. 
505). 


Postman: Punishment has no vital 
place in his picture of the learning 
process (p 511). 


Postman: Thorndike’s revised view of 
punishment startled the psychological 
public. Not only did it contradict the 
belief in the practical value of punish- 
ment... it was also clearly at vari- 
ance with a considerable body of other 
experimental data (p. 502). 


Postman: Against the blanket assertion 
that punishment is not instrumental in 
the elimination of wrong responses it is 
possible to cite a long list of papers 
covering more than half a century of 
experimental work which report that 
punishment is an effective condition of 
learning (pp 505-506). 


G. RAYMOND STONE 


Thorndike: If the situations had not 
vanished and if the subjects had been 
permitted then and there to try an- 
other response after the failure of their 
first one (or first two, or first three, 
etc.), guided by their memory that 
such and such choices made a few sec- 
onds ago had been punished, these hu- 
man subjects would presumably have 
profited more (or suffered less) from 
the punishment (p. 74). 


Thorndike: The best results are ob- 
tained from punishments when the an- 
noying state of affairs then and there 
causes or encourages or at least per- 
mits the animal to operate a right con- 
nection and receive satisfaction there- 
for (p. 78). 


Thorndike: The absence of a punish- 
ment may be psychologically as posi- 
tive a reward as words of praise or a 
money payment (p. 80). In all cases 
the benefit of the punishment lies in 
its power to provoke a change to or to- 
ward the desired behavior (pp. 78-79). 


Thorndike: Punishment by pain, blame, 
disgrace, ridicule, etc., is an almost 
universal feature of most human so- 
cieties, and perhaps of many animal 
groups. It has been a pillar in family 
life, was until recently the corner-stone 
of school discipline and industrial man- 
agement, permeates law and penology, 
and is essential to most religions. It 
deserves study in all its aspects, in- 
cluding its origin (of which vengeance 
is only a minor fraction), its kinship 
with other social consequences of be- 
havior, its theoretical justifications, 
ist and obvious misuses (p. 80). 


Thorndike: Bunch. . . and also Vaughn 
and Diserens... found a gain from 
using moderate or slight shocks in hu- 
man maze-learning, but since there 
were only two choices at each point of 
learning, so that the absence of a pun- 
ishment may act as a reward, and 
since punishment for going into an 
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alley caused the response of going out 
from it, their findings are not evidence 
concerning the potency of punishment 
per se. They may be evidence of the 
potency of reward, or rather of the 
potency of two rewards, namely, ab- 
sence of a shock when entering a right 
alley and relief from shock when back- 
ing out from a wrong alley (pp. 73-74). 


The blanket assertion of which Postman speaks cannot legitimately 
be traced to Thorndike. That the former quotes the latter correctly as 
to the indirect action of punishment in the elimination of responses 
(p. 502) is additional reason for surprise when he cites the long list of 
papers most of which, if not all, are irrelevant or at least not crucial to 
Thorndike’s position. The latter has himself analyzed some of these 
papers and related them to his own conclusion (25, pp. 73—74). Fur- 
thermore, far from believing that punishment is ineffective in learning, 
Thorndike has listed several rules as to how punishment can be made 
more effective (25, pp. 151-152). 

Thorndike is explicit in stating that although punishment does not 
directly weaken a connection, it does induce variability of behavior 
responses and thus may lead indirectly to an alternative rewarded or 
successful response. It is obvious that the effectiveness of punishment in 
learning is a function of the availability of alternative responses which 
may either be rewarded, or, if this is different, escape the punishment. 
It is when such alternative responses are not possible (the ‘‘vanishing 
situation” in multiple-choice learning) that punishment fails to be an 
effective active eliminator. When alternative rewarded responses are 
present, the elimination of punished responses is due not to an absolute 
weakening of the response punished but to the relatively greater strength 
of the response that punishment induced. 

I believe that Thorndike’s analysis is subject to a logical error, which 
will be referred to later, but for the point at issue here he is not at fault. 
When the process effect of punishment has been experimentally isolated 
from the effect of alternative rewards, conditions which Thorndike 
specifically requires,? the evidence is preponderantly in favor of Thorn- 
dike’s conclusion (1, 2, 6, 7, 8, 13, 14, 17, 18, 19, 20, 21, 23, 28, 29, 30). 


?“Tn multiple-choice learning by human subjects where the situation vanishes im- 
mediately after the choice, is replaced by another, and recurs only after an interval of 
50 to 200 sec. filled by other (usually thirty-nine) situations and responses, a connection 
punished by the announcement of ‘Wrong’... is (practically without exception) 
strengthened by the occurrence in spite of the annoying after-effect’’ (25, p. 72). 
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Not all of this evidence is from Thorndike’s laboratory nor is it all re- 
stricted to ‘“Thorndikian’’ experimental designs. Skinner (18) is quite 
doubtful as to the effects of negative reinforcement on the reflex reserve, 
and Estes (1) considers his monograph on punishment in operant con- 
ditioning as confirming the Thorndike conclusion. 

The belief in the practical value of punishment is not contradicted 
by Thorndike. Rather, a re-examination of the evidence may well help 
Postman find an answer to his own question (p. 546): ‘‘Why do organ- 
isms so frequently persist in behavior which is clearly punishing, or, at 
least, more punishing than rewarding?” 

2. The specific action of punishment. upon the strength relations of 
single stimulus-response connections to which it is applied. In the usual 
learning situation where the organism is faced with a choice between 
alternative modes of behavior some of which are successful or rewarded, 
some others unsuccessful or punished, which mode of behavior is selected 
and finally fixated is a function of many possibilities at the level of the 
effect process. Some of the logical possibilities are: 

1. Rewards may be completely indifferent and the rewarded behavior se- 
lected only because punishment has actively eliminated or weakened the other 
modes of response leaving only the rewarded one (or ones) available. 

2. Punishments may induce new behavior but be indifferent with respect to 
the strength relations of the behavior punished, and the rewarded behavior se- 
lected only because the reward effect fixates behavior responses directly. 

3. Rewards may fixate directly and punishments actively eliminate directly. 

4. Punishments may fixate behavior directly but to a lesser extent than re- 
wards and the punished behavior therefore being eliminated not by a process 
of active weakening but, instead, in the relative complex of competing response 
tendencies. 

Postman and Thorndike alike would probably discard the first pos- 
sibility. Thorndike has expressly preferred the second. Postman pre- 
sumably prefers the third. I would like to call attention to the fourth 
possibility. Postman has pointed out (p. 523) in another connection 
that the law of exercise is logically insecure. When, as by Thorndike 
(26), it is offered as a basic law in the selection and fixation of behavior 
responses, it is something like an admission of experimental failure. 
Exercise, or frequency of exercise, is itself a framework of conditions 
within which some process is an active determiner of results. The logi- 
cal status of the law of exercise is something like that of the old law of 
disuse which claimed forgetting as a function of time. McGeoch (10) 
has pointed out that it is not time but the processes which occur within 
the framework of time that can be offered as forgetting determiners. To 
quote Postman in this connection, ‘Frequency per se cannot be profit- 
ably considered a significant condition of learning. Rather, repeated 
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pairings of stimulus and response allow effective conditions of learning 
... to exert their effects’’ (p. 523). 

There is nothing in the data presented by Thorndike which would 
violate the fourth possibility of the process effect of punishment men- 
tioned above. Yet Thorndike’s statement is, ‘‘There is more gain in 
strength from the occurrence of the response than there is weakening by 
the attachment of ‘Wrong’ to it’’ (24, pp. 45-46). Contrary to Post- 
man’s charge that Thorndike ‘‘indicates an unwillingness even to con- 
sider the possibility that punishment may have a weakening effect on 
a connection’’ (16, footnote, p. 503),* this conclusion by Thorndike in 
the face of data which could be interpreted as a strengthening effect of 
punishment looks like a leaning over backwards, actually, a disbelief 
that punishment could do anything else but weaken. 

When Thorndike says that punishment induces variability of re- 
sponse, but does not weaken a connection, there is an immediate question 
of measurement control. In his experimental situation, either his S’s 
repeated or did not repeat (varied their response). There was no meas- 
ure of the strength of a connection other than its repetition. If it failed 
to be repeated, presumably it had been weakened. The effect of punish- 
ment was to induce variability—failure to repeat—and thus to weaken 
(at least relatively). For Thorndike this dilemma of punishment induc- 
ing variability but not weakening a connection is solved by employing 
an absolute value of repetition (chance expectancy) below which the 
number of the S’s repetitions must go before the connection can be said 
to be weakened. Postman has noted the criticism of others that Thorn- 
dike’s chance expectancy of repetition is suspect as the absolute base 
from which the effects of punishment are to be computed. When the 
effects of punishment are computed from an empirical base of repetition 
with no after-effect, the data are still inconclusive (p. 503). 

The present writer has performed an experiment of simple design in 
an attempt to produce data that are crucial to the point at issue (21).‘ 
The experiment provided an empirical baseline of response repetition to 
serial items which had no incentive responses, but were inserted in a long 
list of other items which did have randomized incentive responses of 
“Right,” ‘‘Wrong,” and ‘‘No Response.” 


* This charge was in connection with Thorndike’s interpreting neutral after-effects 
as “ambiguous enough to lead to self-administered reward since the subjects were 
free to interpret the neutral signal as a reward." Thorndike (25, p. 74): “The greater 
strengthening of the connections with neutral after-effects may be explained by the 
hypothesis (a very likely one) that the neutral after-effect permitted an occasional con- 
firming reaction, whereas punishment never did.” 

‘ The report of this experiment is still in press and was thus not available to Postman. 
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The experiment also provided repetition data under exactly the 
same conditions except that the middle item of the isolated series was 
called ‘‘Wrong.’’ A comparison of the two groups of data could be made 
to see whether verbal punishment weakened a connection below either 
an empirical or a chance expectancy baseline. The results indicate no 
weakening (induced variability) whatsoever but actually a probable 
strengthening influence of verbal punishment. This is interpreted as 
confirming the fourth possibility of effect process mentioned at the 
beginning of this section. 

Postman’s point that the spoken word ‘‘Wrong’’ may not be a 
punishment or an annoyer (p. 505) I can controvert only by saying that 
the S’s acted as though annoyed when the word was applied. They 
reported also being disturbed by the fact that they could recognize that 
a response was wrong as they were making it, but they made it anyway. 
That the connection was strengthened is clear. That punishment rather 
than “‘exercise’’ was responsible for the strengthening seems probable 
from the comparison to the empirical base of repetition. 

3. The spread effect of negative incentives in serial multiple-choice 
learning. The Muenzinger and Dove experiment (15) which first offered 
the bidirectional gradient of variability across rewarded occasions sur- 
rounding a punished occasion as the inverse of the ‘“Thorndike effect” 
(spread of a fixative reward effect across punished occasions) seemed to 
the present writer to be very much in need of some experimental con- 
trols. The experiment involved a Thorndikian design in which the S’s 
were presented a long list of items to which they were instructed to 
respond with some number between one and ten. Without the S’s 
knowledge, the E’s responses of ‘‘Right’’ and ‘‘Wrong”’ were applied to 
the S’s responses in a fixed order. Near the middle of the series the 
pattern of E’s responses was: one ‘‘Wrong,”’ seven “Right,’”’ and another 
“Wrong.” It was the data for repetitions of this series that the authors 
offered in support of the bidirectional gradient of variability attributed 
to the punishment; i.e., the closer a rewarded occasion was to a punished 
one, the less it tended to be repeated. 

This conclusion assumed that if the punishments were not present 
the repetitions to the rewarded series would be uniformly level at some 
relatively high value, and that when the punishment was introduced at 
either end of the series it generated the reduction in repetition to the 
items near it. To the writer’s knowledge there has been no evidence 
offered that successive serial rewards would function in the manner 
assumed. It is conceivable that the punishments were irrelevant and the 
gradient was generated by the serial rewards themselves. 
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The present writer has been obtaining data on this question (22), 
using a comparable experimental design but with the serial rewarded 
items isolated by items at either end having no incentive responses, and 
the repetition data as a whole compared to an empirical baseline of 
repetition with no incentive responses. The data, to date, do not sup- 
port a uniformly high level of response repetitions for the rewarded 
series, but, instead, appear to be cyclic in nature. This would, of course, 
confound the nature of the imputed gradients. 

It is reasonable to expect that serial rewarded occasions compete 
with each other, a point of view that is traceable back to Ebbinghaus’ 
remote excitatory tendencies, and is probably compatible with the 
modern concept of inhibition as interpreted by Wendt (31), Hovland 
(4), and Hull (5). McGeoch (9) has noted that studies of the Thorndike 
type, dealing as they do with conditions known to have spread effects, 
themselves do not adequately analyze the role of serial interference. 
Tilton (27) even proposes that a part of Thorndike’s gradients may be 
due to the spread effects of incentives other than the one to which they 
are attributed. The exact influence of the negative transfer effects of 
serially rewarded items upon each other is very much in need of 
investigation in designs other than the usual human rote memory 
studies. 

Further attempts have been made to extend the Muenzinger and 
Dove conclusion. If a bidirectional gradient of variability attributed to 
punishment is to have any generality, it should be operative when the 
punished occasion actually is surrounded by rewarded occasions. It is 
to be remembered that Muenzinger and Dove employed two punished 
items, one at either end of a rewarded series, but neither one actually 
surrounded by rewarded occasions. Comparable experiments (19, 20) 
have been performed but with the following designs: (1) a single verbal 
punishment inserted in the middle of a series of eight rewarded occa- 
sions; (2) two verbal punishments inserted in a rewarded series; (3) two 
verbal punishments, one preceded by four rewarded occasions and one 
followed by four rewarded occasions, but separa ‘ed by a series of non- 
incentive items; and (4) the substitution of electric shocks for verbal 
punishment in designs one and three. 

In all cases of verbal punishment, the post-punishment gradient was 
the reverse of that expected if one were to generalize from the Muen- 
zinger and Dove results; i.e., the closer the rewarded occasion was to the 
punished occasion before it, the greater was its tendency to be repeated. 
When electric shock was used, the post-punishment data revealed no 
consistent gradient, but greater variability in general (20). Postman 
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(p. 518) vaguely interprets these results in support of Muenzinger 
and Dove. 

Tilton (27) has performed a rather intricate and forced analysis of 
his data, concluding, finally, in favor of a bidimensional gradient of 
variability attributed to punishment. The data represent repetitions 
to rewarded occasions which intervene between two punished occasions, 
and are analyzed when the number of rewarded occasions are 2, 3, 4, 5, 
6, 7, 8, and 9 or more. It is significant that in all cases where five or more 
rewarded occasions come between two wrongs there is a confounding of 
gradient effects. In the special cases of five and six rewarded occasions 
there is a reversal of the post-punishment gradient comparable to that 
found by Stone (19, 20). Yet when Tilton selectively combines his 
data, a bidirectional gradient is revealed across four rewarded occasions 
intervening between two punishments. Data from only the first two 
steps following or preceding a punished item were used. If more had 
been used, the gradients would have been confounded. 

The exact nature of the post-punishment variability is questionable, 
but it is probably not the simple gradient implied by Muenzinger and 
Dove. This conclusion is supported by the results of Farber (2) who 
used a punch board maze designed so as to reveal the spread influence 
of wrong responses to adjoining right responses in either direction. He 
concluded that his data with respect to punishment gradients were 
“altogether ambiguous.”’ 

Conclusion. No claim is made here for substituting stable generali- 
zations derived from experimental data in the place of Postman’s noted 
lack of such generalizations. There is, indeed, a current need for both 
definitive and parametric experimentation. That the latter depends 
somewhat upon the standardization of the former (11, 12) however, is 
warning not to fall easy prey to the historical precedent of constructing 
Thorndike from straw. The man and his works have outlived not only 
adverse criticism but also psychology’s twentieth century revolutions— 
the Gestalt, the psychoanalytic, and, greatest of them all, Watsonian 
behaviorism. 
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NOTE ON NEU’S REVIEW OF THE LITERATURE 
ON ABSOLUTE PITCH 


A. BACHEM 
Department of Physiology, University of Illinois 


Concluding his critical review! of the literature on absolute and rela- 
tive pitch and on the improvement of pitch judgment through learning, 
D. M. Neu states: ‘It follows then that ‘absolute pitch’ is nothing more 
than a fine degree of accuracy of pitch discrimination. The pitch of a 


tone is the sound that the particular individual learns as that particular 
tone.” 


These statements must be rejected for several reasons: 


1. Pitch (shortly defined) is the psychological counterpart of the fre- 
quency of air vibration. It consists of 2 components: 


a. Tone height, a logarithmic function of frequency. 
b. Chroma, a cyclic function of frequency with octave periodicity. 


2. Absolute pitch (shortly defined) is the ability to recognize (and 
identify) the pitch of a tone without the aid of a reference tone. 

3. Absolute pitch cannot be treated as an entity since absolute pitch 
identification is possible by different methods: 


a. Every normal person can distinguish low and high tones well enough for 
a crude estimation of their tone height. If the person is familiar with the piano 
scale, he can identify a heard tone by pointing at the corresponding key with 
errors up to several octaves. This ability is better for musical than for unmusical 
persons and it can be improved by experience and intentional learning. Many 
American psychologists name this ability absolute pitch. I prefer the name 
pseudo-absolute pitch for this faculty, since it differs considerably from genuine 
absolute pitch, the only type that musicians acknowledge as “absolute pitch.” 

b. Many singers are able to estimate tone height roughly through their 
laryngeal proprioceptor experience from their own singing. Some authors call 
this faculty absolute pitch, although use is made of a vocal standard (singing 
and humming). In my classification I refer to it as quasi-absolute pitch. 

c. Genuine absolute pitch is of an entirely different nature. Possessors of 
genuine absolute pitch base their ability upon the recognition of that quality 
of pitch that recurs with octave periodicity: C-ness, D-ness, etc., called ‘‘chroma”’ 
in my discussion. This quality is as specific and spontaneous as a color or an 
odor. Non-possessors of genuine absolute pitch have no concept of this pitch 
quality, just as totally colorblind persons cannot conceive the qualities of color. 
Several European psychologists with absolute pitch have pointed to this pitch 


1 Neu, D. M. A critical review of the literature on “absolute pitch.” Psychol. Bull., 
1947, 44, 249-266. 
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modality and its relationship to genuine absolute pitch. I have proven the cor- 
rectness of this assumption by the comparative study of the error distribution 
curves of persons with and without absolute pitch, and by a comparative octave 
experiment on both types of reactors. Neu did not mention these experiments 
in his review, although they are fundamental for the evaluation of genuine 
absolute pitch. 

d. Since genuine absolute pitch makes use of a finer method of pitch deter- 
mination, its properties differ considerably from those of the other types of ab- 
solute pitch. The accuracy is of a different order of magnitude (about one-tenth 
of a semitone vs. 3 to 10 semitones). The judgment is immediate and certain 
in contrast to mere guessing. It depends very little upon familiarity with timbre. 


4. In my experiments on the various types of abolute pitch I in- 
cluded many subjects without absolute ptich as “‘controls.’’ Most ex- 
aminers quoted by Neu did not observe a single person with abolute 
pitch in their studies on the development of absolute pitch by learning. 
If they had included one person with absolute pitch, or if they had 
possessed absolute pitch themselves, they would have arrived at differ- 
ent conclusions. 


One final statement in Neu’s paper claims: ‘Basically, the interbe- 
havioral explanation of absolute pitch rejects the conception of an 
inherent (=inherited) quality.’”’ Using the same arguments, a color- 
blind behaviorist could explain away an “‘inherent’’ faculty of color 
vision, by observing himself and other achromatics learning to stop for 
a red traffic light. The role that inheritance plays in the development 
of genuine absolute pitch is indicated by the following facts: 


1. The occurrence of absolute pitch in families. (I reported 18 such families 
with as many as 11 members having absolute pitch. The influence of ‘musical 
atmosphere” can be ruled out in some of these. In most musical families ab- 
solute pitch is not observed.) 

2. The close association of genuine absolute pitch with musical talent 
(assumedly inherited). 

3. The appearance of absolute pitch in connection with the first musical 
experience. 

4. The fact that no adult person has ever acquired genuine absolute pitch 
by learning. (All statements to the contrary do not refer to genuine absolute 
pitch.) ; 
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SUGGESTIBILITY AND NARCOSIS—A REJOINDER 


H. J. EYSENCK 
Psychological Laboratory, Institute of Psychiatry, Maudsley Hospital 


In a recent issue of the Psychological Bulletin appeared a review of 
‘The present status of psychotherapeutic counseling’ by W. U. 
Snyder (2). This review contains a critical evaluation of a paper by 
Eysenck and Rees (1) dealing with ‘‘States of heightened suggestibility: 
narcosis.”” As this review may seriously mislead readers unacquainted 
with the original article, it seemed appropriate to discuss briefly the 
factual inaccuracies contained in Snyder’s paper. His review reads as 
follows: 


Eysenck and Rees... performed experiments which appeared to show that 
suggestibility of subjects is not greatly affected by the use of barbiturate drugs 
or nitrous oxide inhalation. The task suggested to the subjects was the squeezing 
of a bulb while listening to a gramophone suggestion that they do so. These 
authors conclude that neurotic individuals are more suggestible than normals, 
but absolutely no criteria of neuroticism were given and the total of neurotic 
and normal patients was only 30. 


The statements contained in this extract may be treated seriatim. 


1. ‘The total number of neurotic and normal patients was only 30.”’ 
In actual fact there were 50 patients altogether; these were all neurotics. 
No normal patients were tested. These facts are made perfectly plain in 
the original paper. The imputation that the results do not prove what 
we claim they prove because of the small number of subjects is incor- 
rect; tests of significance were carried out and showed a P<.001. 

2. “Suggestibility . . . is not greatly affected by the use of (nar- 
cotics).’’ In actual fact, our ccnclusions read: ‘“Suggestible patients 
become more suggestible after injection of sodium amytal in subanaes- 
thetic doses.”’ ‘‘Suggestible patients become more suggestible after 
inhalation of nitrous oxide in subanaesthetic doses.’”’ ‘‘Non-suggestible 
patients remain non-suggestible after the administration of these two 
narcotics.” Snyder’s summary on this point is definitely misleading, 
and does not deal at all with our main point. 

3. “‘These authors conclude that neurotic individuals are more sug- 
gestible than normals, but absolutely no criteria of neuroticism were 
given.”” Nowhere in the article do we conclude that neurotic individuals 
are more suggestible than normals, consequently it is difficult to see 
what criteria Snyder would like to see, or in what connection. All our 
patients were neurotic in the sense of being referred to Mill Hill Emergen- 
cy Hospital with the notation ‘‘Neurosis.’”’ While thus there is no at- 
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tempt whatever in our paper to deduce the relationship between neuro- 
sis and suggestibility, reference is made to the fact that in other work 
(summarized recently in the writer’s book Dimensions of Personality) 
a very close connection was found. Full criteria for presence or absence 
of neuroticism were given both in the article quoted in our paper, and 
in the book, which is primarily concerned with the isolation and meas- 
urement of this personality variable. 





A great responsibility rests on those who abstract parts of the scien- 
tific literature. Many readers depend on the accuracy of their reporting, 
and the fairness of their summaries. Snyder’s report and summary fall 
short of any reasonable standard of accuracy. Little value can attach 
to conclusions reached on the basis of such cursory reading 
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SHERIF, MUZAFER, AND CANTRIL, HADLEY. The psychology of ego-involve- 
ments: social attitudes and identifications. New York: John Wiley, 
(London: Chapman & Hall, Ltd.), 1947. Pp. viii+525. 


Much work has been done in social psychology, yet little seems to be 
settled. The subject-matter has at once an understandable, almost 
self-evident character and resists curiously the advances of the investi- 
gator. In an optimistic moment the impression may grow that the facts 
and principles in this region stand ready to be plucked if the observer 
will simply turn his gaze in their Uirection. But then a more sober mood 
provokes the uncomfortable suspicion that the foundations for thinking 
and investigation are by no means secure, and the more radical doubt 
that we do not as yet possess a body of knowledge in this region to which 
the most modest sciences can lay claim. In this unsettled state of affairs 
there is clearly a need to seek for new concepts and for a new starting 
point. In the light of this situation, the effort of Sherif and Cantril to 
provide a more systematic approach to social psychology gains signifi- 
cance and the evaluation of it becomes of interest. 

The Psychology of Ego-Involvements revolves around one principal 
question: How does the human individual who is not at the outset a 
member of society become a part of the social order, and what are the 
basic processes that bring him and keep him under the sway of group 
forces? To this huge question the authors return a seemingly simple 
and inclusive answer, which is the persistent theme of the work. The 
lever of the explanation is their conception of the individual as an ego. 
The individual functions on the social scene as a character of a particu- 
lar kind. He forms in the course of development a self, consisting of 
needs and aspirations; it is in this character that he figures in relation to 
himself and to others. It is this self that lends the tang of significance 
and urgency to the individual’s actions, that each is constantly at- 
tempting to preserve and further, and that brings each individual under 
the control of the group. Sherif and Cantril undertake to describe the 
properties of this self and to show how it is shaped by group conditions. 
The skeleton of their work consists of their characterization of the self 
or ego, while its body contains the evidence they marshal in support of 
their propositions. 

Given this task it becomes all-important to understand the kind of 
character the ego is. At the outset the authors stress that the ego is in 
large part a motivational structure, with the importaat qualification 
that being constantly directed and devoted to itself the motivation is 
“ego-involved.”” Further, the ego consists in large part of attitudes and 
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values, which turn out to be also the attitudes and values of other 
egos, in particular of those belonging to the same group. It becomes 
evident that for the authors the psychology of the ego includes what is 
ordinarily subsumed under social psychology. 

This starting point decides the plan of the work. Having identified 
the formation of attitudes and values with functions of the ego, the 
authors proceed to give a detailed account of the psychology of atti- 
tudes. This they do in terms of the conception of ‘social norms” familiar 
to readers of their earlier writings. Since the ego is the product of a long 
development they devote a chapter to the emergent awareness of the 
self in the child, basing themselves on the descriptions of child psy- 
chology. An extended discussion follows of the reorganization of the self 
in adolescence. The authors consider this examination important for 
their thesis, since they view the period of adolescence as an attempt 
to reestablish the stability of a self undergoing change. Their account 
follows closely the writings of ethnological workers and of sociological 
investigations of adolescents under current American conditions. Here 
they stress the intense efforts of adolescents to find a place in their own 
groups, their mounting need for social approval and the consequent 
tyrannical hold that group standards come to have for them. As the 
relation to groups is for the authors the most ego-involved of activities, 
they proceed to a detailed examination of the formation of adolescent 
garigs and cliques. For evidence they draw primarily upon the investi- 
gations of American sociologists. The work also contains a discussion of 
the impairments of the self under conditions of extreme deprivation, and 
of its deterioration under organic illness and in consequence of social 
disorganization. 

The aim of the authors has been clearly to broaden the foundations 
of social psychology and to extend the application of its principles. 
Their effort to characterize the human individual as a relatively unified 
entity rather than as a collection of motives and habits is evidence of 
the seriousness of their problem. Equally noteworthy is the close at- 
tention they pay to the neighboring discipline of sociology as a source of 
evidence and as a testing ground for psychological propositions. These 
features of their work should be of value in directing the student to 
problems and disciplines that are often excluded from more conven- 
tional discussions in social psychology. The student will also find the 
work helpful as a source of convenient reference to topics and investiga- 
tions that are otherwise scattered. 

It is now necessary to ask how well the authors have succeeded in 
their task. In the opinion of the present reviewer there is serious reason 
to doubt the adequacy of the psychological concepts employed in this 
work and the methods the authors pursue in establishing their conclu- 
sions. Because space forbids reference to the numerous topics of which 
the volume treats it seems necessary to restrict the following remarks 
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to a few main points, namely, (1) the characterization of the ego; (2) 
the doctrine of social norms; and (3) the method of the work. 

1. The relations of the ego to the surroundings. Though the authors do 
not deal in detail with problems of motivation they have necessarily to 
face constantly the question: What forces prompt the ego to action? 
Their answer takes the form of a single and comprehensive proposition: 
The ego strives to belong to the group. By this is meant that the indi- 
vidual is continually striving for “social acceptance and approval.”’ 
This he must do if he is to achieve a condition of stability. To meet this 
condition he is willing to pay any price, to submit to any ordeals. 
Because he stands in this relation of dependence to the group the indi- 
vidual takes over its standards and values, whatever these may be. 
The first and last condition of social existence is to conform. The parti- 
cular demands that a group may make on the person vary greatly, but 
the necessity to conform remains the one invariant that overrides all 
changes. 


In order to achieve this (i.e. the secure anchoring of the ego) he has to and 
wants to identify himself with the group or groups in question. He does his 
level best to incorporate ... the norms of the group, whatever they may be in 
his particular setting. He has to and he wants to conform to them in his be- 
havior. If conformity to these norms is achieved by ruthless competition and 
individualism, he does his level best to be competitive and individualistic to the 
limits of his capacity. If the norms of his group put a great premium on being 
cooperative, he does his best to be cooperative (p. 276). 


At this point we see bluntly that the relation of the individual to his 
group is conceived to be generally that of anvil to hammer. That 
powerful social interests and attachments may develop because men 
do have qualities of interest and fascination for each other, or because 
social life does hold out the possibility of solving actual problems, and 
that the strivings of people may be to an important degree determined 
by the perception of those facts—these assertions would probably not 
be denied by Sherif and Cantril, but they do not figure in their account. 
Nor do they consider it significant in this connection that under certain 
conditions men refuse to conform to social demands that violate their 
sense of what is right or wrong. Instead a process of conformity is as- 
sumed to be central, which functions as a Trojan horse that each society 
slips into its prospective candidates. Given this prime motive the ego 
appears as a culturally determined go-getter, grimly striving at all 
times to find a place and to enhance his position. 

The preceding comments make it apparent that the authors are 
describing a particular kind of ego, one that is self-centered, for whom 
“the personal world . . . becomes centered around himself” (p. 93). At 
this point we encounter a difficulty that dogs the work at every step. 
To say that a person is ego-involved is not yet to say in what relation 
the ego stands to the objects with which he is concerned. These may be 











168 BOOK REVIEWS 


of different kinds, and it would seem to be the task of psychology to 
describe and study them. There are instances—and they are of the 
utmost importance for social psychology—in which not the ego but the 
surroundings and their requirements are for the ego the center. Sherif 
and Cantril do speak of the latter, but then they treat them as being 
psychologically exactly identical with those of the first kind. We now 
discover that when a man rushes to the help of another it is also because 
he is ego-involved, that the Christian martyrs and Stakhanovite work- 
ers were all ego-involved. When the authors cite the attitude of being 
part of something more important than oneself as another instance of 
ego-involvement the confusion seems complete. They can do so only at 
the expense of emptying their key concept of concrete meaning. Either 
ego-involvement must from this point on represent the neutral assertion 
that a psychological structure called an ego is a necessary condition for 
certain processes, or it must mean that exploiting another and helping 
another are psychologically identical processes. This is the consequence 
of assuming that an an action is sufficiently characterized when one can 
call it my action. It seems clear that the authors have tripped over the 
little, treacherous pronoun my, which is able to alter its content in the 
most surprising ways. That the problem is avoided and the distinction 
denied seems to be due to the assumption of the writers that the issue is 
one of ethics, and therefore outside the pale of psychology. They have 
failed to consider that ethical differences need not exclude psychological 
issues. This is probably the reason that they fail to interpret properly 
the significance of certain investigations that bear directly on the ques- 
tion, such as those of H. B. Lewis and of Lippitt and Lewin. 

The final consequence of the failure to make the first necessary dis- 
tinctions is that the term ‘“‘ego-involvement”’ becomes indistinguishable 
from the colorless paraphrase ‘‘there is a motivated individual in ac- 
tion.” In fact, at one point the authors seem to reduce their notion to 
the sense of intensity of motivation: the more intensely a person is 
motivated the more ego-involved he is said to be. It would be difficult 
to see what assertions in the book would be significantly altered if one 
were to systematically substitute for ego-involvement the general term 
motivation in its current use. 

2. The formation of attitudes and values. How egos of the kind de- 
scribed achieve the formation of a social order is dealt with in terms of 
the doctrine of norms first formulated by Sherif. Readers will encounter 
the now familiar propositions that the regulations of society are first 
external to the individual, that these are incorporated or “internalized,” 
that this is accomplished by a process of learning, and that the norms 
possess a framework character. As will be evident from the earlier re- 
marks, the process of becoming a group member consists for the authors 
“mainly in the achievement of conformity in experience and behavior 
to social values, standards, or norms already established” (p. 11). Be- 
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cause this portion of their work is based most directly on investigation 
in social psychology rather than upon evidence from neighboring disci- 
plines it may be well to scrutinize it more closely. 

A host of phenomena in the social field seem to stand in a strange 
relation to this characterization of norms. Men hire themselves out to 
work for a wage; this is a standard condition to which millions must 
conform. Is factory labor a norm in the sense of Sherif? Are hospitals 
and schools norms? Was the transformation of serfs and artisans into 
wage laborers a product of changing norms? There seems to be a range 
of institutions and activities that develop not out of norms but out of 
very hard conditions and also out of the existing state of knowledge and 
technique. At least in relation to these, the formulations of Sherif are 
inadequate and can be misleading. One might perhaps answer that the 
attitudes of people toward work, toward medicine and jails contain a 
norm character. Assuming that this is the case, would it not still be 
necessary first to establish how the given attitude is determined by the 
correct grasp of actually given conditions? It seems to the reviewer that 
the psychological account of norms by Sherif and Cantril has a tacit 
reference to those actions of men in society the basis of which they do not 
really understand. One may be permitted to question whether even this 
limited attempt can succeed without first becoming clear about those 
features of their situation that members of society are able to grasp 
adequately. 

As in the case of ‘‘ego-involvement”’ the authors here too rely on a 
psychological concept the content of which they assume to be self-evi- 
dent, but which is actually laden with ambiguities and problems. We 
refer to the notion of ‘“‘conformity.’’ Now conformity may refer to the 
realization that clearly given objective conditions make necessary a 
given course of action. Once it is understood that there need be traffic rules 
the necessity to conform follows. Conformity may also represent sub- 
mission to sheer force. Or it may be the consequence of thoughtless 
habit. Does not a psychology that proposes to speak of the foundations 
of social action need to distinguish between these? 

We are forced to the conclusion that the doctrine of norms over- 
looks at least one feature that is vital to the establishment of social 
order. It ignores those categories of action that are grounded in clearly 
perceived conditions, and directs itself first and foremost to social mis- 
conceptions, which are then taken as the prototype of social situations. 
In no other way does it seem possible to understand the insistence that 
norms are first external to the individual and then internalized. Does 
not this statement apply to all objects in our surroundings, including 
the principles of science? Does it follow that the binomial theorem is a 
norm? If not, can one exclude that there may be social truths that are 
grasped similarly? ‘“‘External’’ seems actually to mean that which is 
alien to the properties and tendencies of individuals, and which can 
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therefore become part of them only by a process of imposition. In 
consequence the only way in which such data—which for the authors 
comprise the greater region of social happenings—can become part of the 
person is by means of sheer acceptance or “‘internalization.”” Yet even 
when a man “‘internalizes’’ a chicken he does not turn into one. Why 
then should the doubtful privilege of internalization be reserved for the 
most significant relations into which men enter? 

3. The question of method. The manner in which the authors establish 
support for their theoretical views calls for comment. The procedure is 
to state a proposition and to quote extensively from diverse sources in 
support of it. The authors state at the outset that it required virtually 
no effort on their part to find verification for their views in the investi- 
gations of social psychology, in the writings of sociology and also in 
works of literature to which they devote a full chapter. But the very 
ease with which they establish their conclusions arouses disquiet. In 
the course of an appeal to an enormous diversity of sources hardly a 
shadow of disagreement ever arises with regard to any point of fact or 
theory (with the sole exception of Freud, with whom the authors take 
issue, and in the course of which they make a number of highly pertinent 
comments). Are we to conclude that social psychology is immune from 
the doubts and uncertainties that beset the most mature sciences? Is 
it really the case that we can read off support for psychological proposi- 
tions from the contributions of sociology and anthropology? 

A closer examination reveals at this point one of the grave difficulties 
this work encounters. It seems to the reviewer that between the evi- 
dence upon which the authors draw for support and their own assertions 
there is often an immense gap. In the accounts cited of youthful groups, 
as well as in the chapter devoted to the illustrations from literature, the 
authors refer with evident approval to situations which describe impres- 
sive and often touching accounts of human courage, helpfulness and 
sacrifice. They give due credit to the presence of these qualities even 
in delinquent gangs of adolescents. But all these accounts are simply 
taken to be exemplifications of their concepts. Does this mean that the 
egos of Shakespeare and Euripides have been preoccupied with the 
ego-involvements of their characters, and that they have all along been 
providing confirming evidence for the psychology of Sherif and Cantril? 
If so, only one comment seems adequate. There is no clear bridge, and 
there seems indeed to be an opposition, between the ideas of the authors 
and such strange phenomena as responsibility and mutual respect. It 
does not seem possible to dispose at one point of the problem of values 
by dubbing them “‘affective fixations” and later to speak of values in 
terms of'the poet or of the ordinary human‘being. 

That the authors commit this confusion does them in one sense 
credit; they have been clearly striving toward a view that would do 
justice to their own perceptions and values. But the concepts they have 








oc * = ao = mm 4 


o 
cr 











BOOK REVIEWS 171 


introduced cannot be so readily dismissed. Their personal comprehen- 
sion of the human situations with which they deal is often far superior 
to their theoretical tools; it becomes then only too easy to assimilate the 
latter to the former without any corresponding deepening of under- 
standing. This procedure of attempting to gain sanction for their views 
introduces grave confusion for theory. 

It is appropriate to appraise a scientific inquiry by its fruitfulness in 
bringing to light new principles or facts, or the reinterpretation of known 
facts. In this light one must conclude that The Psychology of Ego In- 
volvements introduces no change in our understanding. The old and 
unsatisfactory edifice of social psychology remains intact. The investi- 
gator following the labors of the authors will return to his own tasks 
with the chastened feeling that in the forest of social psychology the 
first paths still have to be carved out. 

S. E. Ascu. 

Swarthmore College. 


ALLporT, G. W., & Postman, L. The psychology of rumor. New York: 
Henry Holt, 1947. Pp. xiv+247. 


In times of crisis, psychologists are besieged with requests to 
“explain” rumors. Many of us avoid the issue by replying in vague 
generalities or by pointing out that it is impossible to exercise adequate 
scientific controls on genuine rumors. Allport and Postman have ac- 
cepted the challenge without equivocation. The result is an extremely 
readable and informative little volume, based in part on their own 
research. 

A brief review of wartime rumors and the work of rumor clinics is 
followed by a discussion of why it is that rumors circulate. This involves 
the proposal that the ‘“‘amount of rumor in circulation will vary with 
the importance of the subject to the individuals concerned times the 
ambiguity of the evidence pertaining to the topic at issue’ (p. 34). 
Rumor relieves, justifies, and explains the underlying emotional ten- 
sion; projection is common in rumors. 

The authors describe a series of experiments or demonstrations of 
what purports to be a rumor process. In some thirty instances, a 
picture viewed by an audience was described to a person who could not 
see the screen on which the picture was projected. The latter person 
then reported the description to a third subject who came into the room, 
and so on until six or seven subjects had repeated the report, all before 
the audience. The results were of considerable didactic value to the 
members of the audience. Examination of the protocols showed that 
the ‘‘rumor’’ tended (1) to become shorter, more concise, more easily 
grasped and told (leveling and sharpening), (2) to be assimilated to the 
emotional context existing in the listener’s mind, (3) to be shifted in 
theme or elaborated. These changes resulted either from misunder- 


silencio Migs 


172 BOOK REVIEWS 


standing or from the effort to make the rumor more meaningful. About 
70 per cent of the details were eliminated by the fifth repetition. With 
children as subjects the results were even more pronounced. 

Throughout the book reference is made to the work of Whipple, 
Binet, Stern, Bartlett, and others on recall, testimony, and rumor. A 
chapter on ‘“‘Rumor in Society’’ discusses rumor in history, the nature of 
legend, the metaphorical significance of rumor and legend, the classifica- 
tion of rumors, the fusion of passions and antipathies, rumor publics, 
the press, whispering campaigns, and the function of rumor in riots. An 
analysis of seven cases of rumor from the literature and a guide for the 
analysis of rumor complete the book. An appendix presents standards 
for agencies working on the prevention and control of wartime rumors. 
The bibliography will be useful to students and psychologists alike. 

Although the book is evidently written for popular consumption, 
psychologists and social scientists will find it stimulating and provoca- 
tive. With this in mind, it is not derogatory to point out that the book 
has certain shortcomings as a psychological treatise. For instance, the 
authors present a hypothesis in the guise of a formula (R~iXa) which 
the reader may reasonably expect to find tested in the experiments. 
The “‘experiments’ turn out to concern principally the fidelity of report. 
Although they illustrate many of the principles of rumor, no actual rumor 
is involved. While the results are not inconsistent with the hypothesis, 
no attempt seems to have been made to vary systematically and quan- 
titatively either the interests of the subjects (4) or the ambiguity of the 
situations (a). Thus the experiments seem not to have been designed to 
test the formula. 

Although the richness of the prose in the last three chapters en- 
hances the literary value of the book, it may lead to semantic difficulty. 
One knows, of course, that the following language is not intended to be 
taken literally, but where could one find a better example of the ‘‘or- 
ganismic fallacy” in social theory? 


Within the social organism the bacilli of rumor are always active. Sometimes 
they move sluggishly in a non-virulent fashion. Sometimes they burst forth in 
a fever of violent activity. The fever, unfortunately, burns most dangerously 
when the health of the social organism is least able to withstand its ravages 
(p. 193). 


For the purpose of combating rumors, it may be expedient to gen- 
eralize and oversimplify in discussing the motivation behind them, but 
such a procedure is hazardous in a psychological analysis. Allport, in 
his Personality, has decried the tendency to attribute general and uni- 
versal motives to all men. It seems to the reviewer that there is a good 
deal of over-simplification and generalization regarding motives in the 
Psychology of Rumor. One example is to be found in the contention that 
the circulation of certain rumors concerning Negroes is an attempt on 
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the part of whites to evade guilt for the treatment and status of Negroes. 


Guilt evasion is likewise detectable in innumerable rumors detailing inci- 
dents of the Negro’s criminal and disloyal tendencies. One wartime story had it 
that Negroes were not being drafted as rapidly as whites because authorities 
were afraid to let them get their hands on guns (p. 177). 


The story is attributed to H. W. Odum’s Race and Rumors of Race. 
Allport and Postman have not stated that guilt-evasion is the only 
possible motive for the rumor, but the story is treated as a hostility 
rumor and no other explanation is offered. Most psychologists have a 
way of assimilating ‘‘racial’’ rumors to relatively standard psychological 
frames of reference, and the authors are no exceptions in this case. 
Granted that this could well be a hostility rumor representing guilt- 
evasion on the part of whites, is it necessarily so? It could have been 
accepted and spread by Negroes as a hostility rumor against white 
“authorities.”” The reviewer had not encountered this rumor previously, 
but he and many others could have granted it some credence, or at 
least have assimilated it to their own frames of reference, without 
hostility against Negroes and without the sort of guilt-evasion mentioned 
by the authors. As a supply officer for a white squadron in pre-Overseas 
Replacement Depot training at a Northern air field for a brief period 
during World War II, the reviewer was ordered to issue rifles to the 
white soldiers, for which he had to account daily. The white supply 
officer for the colored squadron undergoing the same sort of training did 
not have to account for any rifles, because tt was not ‘policy’ to issue 
firearms (even without ammunition) to Negro troops. The reviewer has no 
statistics on the drafting of Negroes early in the war, and his other 
evidence is at best anecdotal, but it puts the tendency to accept and 
circulate such a rumor on a different basis from that postulated by the 
authors. 

In other instances the authors attempt to “combat” rumors by the 
simple expedient of ‘‘name-calling.” (E.g., referring to rumors like 
“The Jews are evading the draft’ as ‘‘monstrosities current early in the 
war’ (p. 11).) Such minor faults as have been mentioned, if they are 
faults, are hardly sufficient to detract from the excellence and thorough- 
ness of the book as a whole. 


ARTHUR JENNESS. 
Williams College. 


KRACAUER, SIEGFRIED. From Caligari to Hitler. A psychological history 
of the German film. Princeton: Princeton Univ. Press, 1947. Pp. 361. 


The sub-title of Kracauer’s book indicates its method and its pur- 
pose. It is a psychological history in the sense that it purports to dis- 
cover the forces which underlie and foreshadow historical events by 
the analysis of the material provided by the motion picture. It is 
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Kracauer’s thesis that in films clues are to be found which reveal the 
“collective dispositions’ of a people. These inner dispositions deter- 
mine the course of historical events and, when known, make possible 
the prediction of specific historical outcomes. While external causes 
may determine inner attitudes, when once established these tendencies, 
according to Kracauer, may assume an independent life and themselves 
become the springs of historical evolution. The pictorial and narrative 
motifs in films and other mass media reflect these hidden psychological 
dispositions and compulsions, and thus become data for the psychologi- 
cal historian. 

The period analyzed is that between the appearance of the famous 
film The Cabinet of Dr. Caligari in 1920 and the rise of Hitler in 1933. 
Kracauer feels that the “strange reactions of the German masses,” their 
inertia, the tremendous impact of Hitler, and the refusal of many Ger- 
mans, until the last moment, to take him seriously are not adequately 
explained by obvious economic and political factors. Rather, deep- 
seated psychological patterns and hidden motivations explain these 
events. The frustrations of the middle classes, the “white collar pre- 
tensions” and rationalizations of the working class, the psychological 
retreat of both these groups from reality, their ambivalent attitudes 
towards power and authority, and their symbolic quests for security 
and a way out of their dilemmas are among the psychological patterns 
which Kracauer finds reflected in the films produced and shown in 
Germany during this period. 

Many of these motifs are to be found in Caligari. Major themes of 
this film according to Kracauer, indicate ambivalent reactions to 
authority and express ‘‘a strong appetite for sadism and destruction.” 
Like the Nazi world which it foreshadows, this film is filled with “sinister 
portents, acts of terror and outbursts of panic.” 

Kracauer has performed a ground-breaking task. There is no more 
pressing problem for the social psychology of the mass media of com- 
munication than the analysis of the factors which condition film content 
and underlie the impact of that content on the audience. Many of 
Kracauer’s analyses are discerning and contain illuminating insights. 
The period in question, so crucial for Germany and the world, needs 
the kind of study which perhaps only the social psychologist can make. 
But for this reviewer, Kracauer’s study has grave defects both in method 
and conceptualization. His analysis of film content assumes two levels 
of meaning, one of which is manifest and one of which is hidden. The 
last carries the ‘‘real’”” meaning for the mass audience. This audience is 
presumed to intuit or in some manner become aware of the motifs in 
the film which satisfy and express its hidden needs. This seems to mean 
that the ‘‘unconscious”’ intuitions of the makers of films communicat 
to the ‘‘unconscious’’ minds of the mass audience. Most social psycholo- 
gists will not be satisfied with this interpretation. They will inquire 
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how we may know, objectively, that the film carries the meanings which 
Kracauer finds in it. Neither will they accept a terminology which in- 
cludes such phrases as “‘collective mind,” “collective soul,’’ and other 
terms which reify collective behavior. 

In spite of these defects, this book must be studied by all serious 
students of film history and the role of films in society. It is magnifi- 
cently illustrated, and contains an appendix entitled “‘Propaganda and 
the Nazi War Film” which is an extremely useful analysis of specific 
examples of Hitler’s war films. 

FRANKLIN FEARING. 

University of California, Los Angeles. 


SOROKIN, Pitrim A. Society, culture, and personality: their structure and 
dynamics. New York & London: Harper, 1947. Pp. xiv+742. 


Psychologists usually find the whole field of psychology too broad 
to follow in detail, find it impossible to keep track of the developments 
in all the borderline sciences. When a psychologist does climb out of 
his diggings to get a perspective on his work, he usually lines his sights 
up with physiology and biology, with the reflex and the rat. 

Sociology is our far border, which we tend to shun. Sorokin’s 
volume provides a perspective on many psychological problems as seen 
from the standpoint of a science of sociology, of the super-organic. Here 
is a monumental work by a leading scholar which largely ignores the 
details of modern psychology, but in which many important psycho- 
logical concepts appear as essential factors in the analysis of the struc- 
ture and dynamic functions of social groups and societies, in the analysis 
of the rise and fall of civilizations. A few of these likely concepts may be 
mentioned. 

1. Personality is a central concept. The systematic problems of soci- 
ology hinge around the three terms in the title. The generic model of 
socio-cultural phenomena is the interaction of two or more persons, and 
these involve an irreducible triangle of items: society, or the groupings 
and organization of individuals; their culture, or systems of meanings, 
values, norms, their gestures and actions of intercommunication, and 
the vehicles which express and objectify these meanings; and the 
personalities of the members, which are molded in their psycho-social, 
and also in part their biological, properties by the culture in which they 
are raised. There is a parallelism of socio-cultural processes and per- 
sonality which is worked out in elaborate detail. 

2. Personality is not unitary. There is a pluralism of selves in the 
individual, which is a reflection of the pluralism of groups to which the 
individual belongs and within which he has definite and distinct roles, 
The various selves or “‘egos’’ may be classified into two divisions: bio- 
logical and social. There are as many biological egos as there are 
biological needs, Freud notwithstanding. These are now and then 
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antagonistic, and are certainly different in the child from the mature 
person. Social egos sanction and rationally bless the satisfactions of 
bodily needs when they follow socially prescribed means. There are as 
many social egos as there are groups of which we are members, such as 
the family, occupation, religion, and within which our responses are 
patterned. Integration and disintegration, conflict and antagonism, and 
breakdown within the individual clearly reflect the level of integration 
of the ideological, behavioral and material cultures of the individual. 

3. Normative motivation of most social behavior is the major feature of a 
group, an institution, or a social system. Normative behavior is not pur- 
posive, nor shrewdly calculating utilitarian behavior. As for conscience, 
the stimulating conditions for the motivation are grounded in the past, 
and the reactions are immediate. These judgments and convictions are 
the basis of law and ethics. There are several kinds of normative acts: 
law-norms, moral-norms, technical norms (how to make biscuits), and 
norms of fashion or etiquette. The law-norms are two-sided, contrac- 
tual. They involve rights and duties, what you are entitled to demand 
or to deliver, and they have the force of law and government behind 
them. Moral norms on the other hand move in the direction of ethical 
perfection. They only urge and recommend certain conduct, but cannot 
expect it. Thus, they are only one-sided, free to be fulfilled. A person, 
even a soldier, is not expected or obliged to be a hero. 

4. A group may be solidary, internally antagonistic, or mixed in tts 
systems of interaction. The solidary is integrated, predominantly family- 
like, with their lives intermingled. Legal-norms are somewhat out of 
place. Mixed groups are of the contractual type (law-norms), where the 
persons may remain strangers, but organize formally for mutual pro- 
tection or benefit. The compulsory type of organization is antagonistic, 
despotic, based on force and supplemented by fraud. Some pseudo- 
familistic and pseudo-contractual groups are actually compulsory. The 
inner world of each is closed to the other; and the dominant group may 
begin to think of the others as outcasts, of an impure race, “‘dogs.”’ 

5. There is one main and simple condition for national and international 
peace. It is the presence in society of a well-integrated system of basic 
values, and corresponding norms practiced in overt behavior, in har- 
mony with each other, and based primarily on the principle of the Gold- 
en Rule. Violence, revolution and wars occur when the value-systems 
are not just different, but incompatible, antagonistic. Sorokin sees the 
many programs for peace to be important as factors, but completely 
futile without a fundamental shift in the value-systems of the cultures 
of our time. 

These and many other suggestive notions appear throughout this 
book. It is clearly a rational-empiricai, non-experimental formulation; 
it is by turns profound and over-elaborating, but never fluent; and pre- 
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sents many concepts that should be highly provocative to social psy- 
chologists. 


RaALpepH H. GUNDLACH. 
University of Washington. 


HALL, R. B. Area studies: with special reference to their implications for 
research in the social sciences. Social Science Research Council 
Pamphlet, No. 3. New York: Social Science Research Council, 
1947. Pp. v+89. 


Social psychologists shudder fastidiously at the thought of ethno- 
centrism; yet, of all social scientists, social psychologists are among the 
least willing to engage in research beyond the boundaries of the United 
States. That social psychologists, for all their obeisance to the concept 
of culture, are essentially cultural homebodies, is suggested by the fact 
that they have had little part in (or even awareness of) the develop- 
ments surveyed in this pamphlet by Robert Hall. 

Hall’s survey deals both with foreign area programs and with pro- 
grams which center upon the United States (or some region thereof) ; 
but he gives overwhelming emphasis to programs concerned with areas 
outside the United States. The survey is timely. ‘Area programs’’ 
have mushroomed on almost every campus. Some of these programs 
represent a development which no social scientist can ignore; others are 
old wine in new bottles; and others—regrettably—are flabby and mere- 
tricious concessions to fashion. Hall has performed a useful service in 
underscoring those aspects of area study which merit the serious atten- 
tion of all social scientists. 

In area study, experts from various academic fields pool their special 
skills and special knowledge in an attempt to understand a given area. 
The “‘area’”’ may be a nation, a region within a nation, or a group of 
closely related nations (e.g., Latin America). 

The proponents of foreign area study, according to Hall, rest their 
case on three major arguments. First, it is argued that the universities 
have an obligation to the nation not only to develop a citizenry ade- 
quately informed as to the rest of the world, but to create a pool of 
foreign area experts and to accumulate a body of precise knowledge on 
foreign areas. The second argument, reminiscent of F.C.S. Northrop, 
emphasizes the fact that ‘“‘we have studied men isolated in the milieu 
of the North Atlantic, thinking that we have been studying man.” Our 
generalizations lack universality, according to this argument, because 
we have been unwilling to range widely in the examination of other 
cultures. 

The third argument emphasizes the opportunity which area study 
provides for forcing the self-isolated academic disciplines into produc- 
tive collaboration, Repetitious talk of the virtues of collaboration has 
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accomplished little; collaboration will come about when the various 
disciplines are faced with concrete problems which demand a collabora- 
tive attack. The intensive study of an area, so the argument runs, pro- 
vides just such a concrete problem. 

But the current enthusiasm for area studies has not gone uncriti- 
cized. The two most widely-heard criticisms according to Hall are, 
first, that area study is not a “‘discipline,” that it has no “hard core’; 
and, second, that men trained as “area experts” without specialization 
in a traditional discipline will find it hard to get jobs. Hall points out 
that both objections are based on the misconception that area programs 
reject specialization in one discipline. It is true that at the under- 
graduate level (where the objective is a liberal education rather than 
professional specialization) area studies rest on a broader base than the 
normal major, but in this way are supported by the current trend away 
from intensive specialization in undergraduate years. At the graduate 
level, the best area programs require the student to select a field of func- 
tional specialization to accompany his area specialty. Each of the stu- 
dents at the Columbia University Russian Institute, for example, 
works toward an M.A. degree one in of the regular academic depart- 
ments at the same time that he is working toward his certificate from the 
Institute. In the best area programs, students beyond the M. A. level 
usually carry on their work in a regular academic department, although 
they are expected to relate this discipline to their area interest, and will 
presumably select as a thesis topic some problem having to do with the 
area. 

Hall’s report is of necessity general in nature. After reading it 
through, the social psychologist is likely to find himself with certain 
unanswered questions bearing on his own field. Precisely what sorts of 
contribution could social psychology make to an area training program? 
What sorts of researchable problem might the social psychologist find 
if he turned his hand to area research? Hall might well point out that 
it is the responsibility of the social psychologist to answer these ques- 
tions for himself. 

JoHn W. GARDNER. 

Carnegie Corporation of New York. 


Kitay, P. M. Radicalism and conservatism toward conventional religion. 
Teachers College, Columbia Univ., Contr. to Educ., No. 919. New 
York: Bureau of Publications, Teachers College, Columbia Univ., 
1947. Pp. viii+117. 


This volume deals not with radical and conservative approaches to 
religious problems but with differences in personal experiences and 
opinions between end groups of a college student sample. 

This book examines communicated materials ranging from responses 
to questionnaires in the area of radicalism-conversatism to autobio- 
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graphical life histories. About 36 percent of an urban Jewish sample 
were analyzed. They constituted two equal groups, selected for highest 
and lowest scores on a scale measuring attitudes toward the church. 
These groups were labelled Pro-church (conservatives) and Anti-church 
(radicals). The analytical techniques employed categories that were 
susceptible of quantitative treatment, such as items that can be directly 
observed, relationships that can be counted for favorable or unfavorable 
ratings, and statistics that rely on concepts of the probability curve for 
the significance of inferences (principally Chi Square). 

The two groups differ significantly on such items as church attend- 
ance, religiousness of the parents, satisfactory home adjustments, and 
traumatic experiences, though hardly at all on many items such as 
intelligence. Other questions investigated, probably limited in produc- 
tiveness by this methodological framework, relate to causes of radical 
attitudes, to the existence of a generalized radicalism, and the produc- 
tiveness of communicated life histories, 

Though there is much need today to encourage and seek generous 
support for such social researches, this fact should not blind us to the 
recognition that the reconstructive work on social issues must be done 
chiefly by discrediting current habits and causing a rethinking of al- 
ternatives. It is true we must trust those methods that are products of 
our reason; but, if we bank too heavily upon them where they are 
poorly adapted, they will bind us to low performances. It seems rea- 
sonable here to blame some inadequate returns or low confidence upon 
over-devotion to methods found applicable to analyzing certain kinds of 
group differences. Effective analysis may be hampered by self- or 
culturally-imposed pressures to stay within the boundaries of chance 
curve statistics, the limiting rules of a narrow science, the simpler levels 
of classification, and respondents’ communicated materials. Radicalism 
and conservatism are dynamical personal constructs, evolved from long 
developmental and adaptive processes of adjustment to external situa- 
tions and cultural obstacles. Time and energy expended to understand 
more fully structural levels and relations, developmental processes, and 
dynamical pressures viewed both from within and without, may be 
more fruitful in clarifying vision and understanding than the number 
of frequency distributions or the refinements of statistical measures. 

However, within the framework of the accepted method and task, 
the product merits approval and support. 

James L. GRAHAM. 

Lehigh University. 


Worts, S. B., e¢ al. Physiological and psychological factors in sex behavior. 
Annals New York Academy of Sciences, 1947. Vol. XLVII, Art. 
5. Pp. 603-664. 


This compilation of five individual papers summarizes the portions 
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of a conference held by the New York Academy of Sciences in 1946. 
In a brief introduction S. B. Wortis comments on the need for revision 
of social attitudes toward human sex behavior—revision based upon a 
wider understanding of the biologic and cultural factors contributing to 
man’s sex life. 

W.C. Young competently reviews the evidence concerning endocrine 
control of sex behavior in male and female animals. The general ap- 
proach is that of the physiologist. The treatment is confined to lower 
mammals. 

W. E. Galt discusses in interesting fashion the sex behavior of mon- 
keys and apes. The familiar point is made that the female’s physiolog- 
ical condition tends to control the mating behavior of both sexes, al- 
though various socio-psychological factors sometimes lead to copu- 
latory activity when the female is not in estrus. The role of learning in 
the ape’s sex performance is discussed and the occurrence of bisexual 
activity by males and females receives consideration. 

In two and one-half pages A. C. Kinsey summarizes his interview 
data on sex behavior in human beings. 

Morris Herman's discussion of aberrant sex behavior in humans is 
based largely upon clinical histories presented with psychiatric interpre- 
tations. ‘“‘Aberrant sex behavior can best be defined as sex activity 
utilized by preference as an end-point in gratification, despite the oppor- 
tunity and ready availability of heterosexual genital contact.’’ The 
socio-legal aspects of the field are accorded some attention. 

The volume closes with a presentation of the cultural anthropol- 
ogist’s point of view as expressed by Gregory Bateson. He stresses the 
importance of individual experience in man’s sex life and points out that 
cultural channelization of behavior is by far the most important agent 
in structuring human sexuality. The discussion is almost entirely theo- 
retical. 

One is disappointed by the absence of any real integration of the 
material presented by the individual authors. Several of the papers are 
useful and informative but there is no attempt to pull them together so 
that the reader can decide for himself whether there are any threads of 
continuity in present knowledge of the physiology and psychology of 
sex behavior in various species. The general impression is that rats are 
rats, monkeys are monkeys, men are men and any similarities are purely 
coincidental. For most psychologists the most thought-provoking 
contribution would have been that of Professor Kinsey whose address at 
the American Psychological Association meetings in 1946 evoked a 
great deal of inrerest. The reviewer attended this conference and heard 
Kinsey’s contribution which was exceedingly stimulating and elicited 
considerable discussion. Unfortunately the paper is presented in a form 
so condensed as to be tantalizing rather than satisfying. The reason 
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given is that the author is reserving the factual material for a forth- 
coming book. 
. F. A. BEACH. 


Yale University. 


SMITH, G. MILTON. A simplified guide to statistics. (Rev. Ed.) New York: 
Rinehart, 1946. Pp. xiv+109. 


“This Guide is primarily intended to clarify and to supplement the 
material in general and laboratory courses... ’’(viii). The contents 
include chapters on distributions, central tendency, variability, norms 
and grading, standard scores, normal curve, reliability of means, small 
samples (#), correlation, reliability and validity, and chi square. For 
some unknown reason regression is omitted but such specialized topics 
as biserial r and Fisher’s method of combining independently obtained 
probabilities by means of chi-square are included. 

Smith’s book has a number of good points; e.g., experimental orien- 
tation, natural introduction to some topics by means of examples, and 
exercises. But these good points are, unfortunately, out-weighed by a 
plethora of meaningless statements, pointless statements, and outright 
errors. To give a few examples: to state that “age to nearest birthday’ 
scores are ‘‘mathematically more precise’ than ‘‘age to last birthday”’ 
scores (p. 9) seems meaningless if not incorrect. It seems pointless to 
remark that ‘‘There is no simple explanation of these last two percent- 
ages; they are derived by integrating the normal curve between certain 
limits. But this need not concern us here’’ (p. 33). Also if any instructor 
is enlightened by the ‘‘Note for instructors’’ on p. 54, he is in desperate 
need of a decent elementary course in statistics. Errors, though seldom 
egregious, are numerous. The PE should not be “‘used interchangeably 
with Q” (p. 33). The rank-difference coefficient, rho, is not ‘‘best 
suited to studies involving a small number of cases’”’ (p. 73). Although 
usually very accurate on the interpretation of confidence intervals, Smith 
slips up on the asymmetry of the interval for true percentages other 
than 50 (p. 62) and makes another type of error with respect to the con- 
fidence interval for chi-square (p. 89). On p. 80, we learn that a high 
validity coefficient indicates validity only if accompanied by high reli- 
ability coefficients. And so it goes 

There is probably a need for a short ‘‘Guide to Statistics,’ but the 
reviewer feels that the quality of this book is not high enough for it to 
merit extensive use. 


Davip A. GRANT. 
University of Wisconsin. ' 


MursELL, JAMES L. Psychological testing. New York: Longmans, 
Green, 1947. Pp. xiv+449. 


Had the author written this book in 1940, it might have been 
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considered a reasonably sound, up-to-date treatment of theoretical and 
practical aspects of psychological testing. However, the reviewer was 
disturbed to find no mention of many of the relatively new tests which 
are of widespread current usage. Notable omissions, for example, are 
the Kuder Preference Record, the Minnesota Multiphasic Personality 
Inventory, and the Bennett Mechanical Comprehension Test. A similar 
impression is gained from an examination of the bibliography, which 
indicates negligible reference to the literature appearing after 1939. 

The author presents with unusual adequacy certain introductory 
concepts, such as the nature of psychological tests and some of the 
requirements of instruments (reliability, validity, objectivity, and stand- 
ardization) in the first two chapters. Other fundamental concepts 
having to do with test-score interpretation are, for reasons not well 
understood by the reviewer, placed near the end of the book. That test 
scores have no known zero origin and that the units are unlikely to be 
equal are discussed in the final chapter, though many teachers would 
prefer to deal with this material relatively early, as a basis for intro- 
ducing the point that test scores become meaningful only in terms of 
their ordinal position in a distribution of scores for a group of known 
characteristics. 

Mursell devotes four chapters (144 pages) to his discussion of the 
concept of intelligence and to methods for its measurement. His treat- 
ment is sound and rather thorough, departing insufficiently from tradi- 
tion to arouse appreciable controversy or adverse criticism. His cover- 
age of the New Stanford-Binet is much greater than that of the Wechs- 
ler-Bellevue, an emphasis which is usual but which the reviewer would 
prefer to see discontinued in books of this nature. 

In his treatment of tests in other areas than intelligence the author 
seems to betray a less than desirable intimacy with his materials. This 
point becomes somewhat apparent in the chapter devoted to aptitude 
testing, where a more extensive treatment of the difficult problems of 
validation might have been expected. The point becomes more evident 
in the author’s discussion of personality testing, where there appears 
little attempt to cope with the complex conceptual problems involved 
in this area, and also the use of somewhat unusual criteria for evaluating 
particular tests. For example, the author seems impressed with the 
Humm-Wadsworth Temperament Scale partially because of its restricted 
sale, while he neglects to mention the MMPI, which was constructed 
on a similar basis (meeting with Mursell’s approval) but standardized 
more carefully and less obscurely. Moreover, the reviewer cannot im- 
agine that students are likely to develop interest in the fundamental 
problems of personality testing when a test is referred to as an “‘unde- 
fined hodge-podge”’ or as ‘‘a piece of pseudoscience.”’ 

The final chapter deals with what Mursell regards as evolving im- 
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provements in testing practice and theory which are “richer in psy- 
chological significance.’’ Here are covered such topics as the concept 
of measurement, the various suggested techniques for test-score inter- 
pretation, some standardization problems, factor analysis, and pro- 
jective techniques. The discussion amounts to a fairly adequate defini- 
tive introduction to these problems. However, the reviewer would disa- 
gree that whereas ‘‘a psychometric test is committed to measurement 

. , a projective test is committed to diagnosis.”’ It would seem more 
appropriate to admit that ‘‘measurement” is not accomplished by 
either, and to regard both merely as stimulating situations aimed at 
eliciting responses for evaluation by the psychologist. 

In general, it may be said that the book will be well adapted to those 
courses in which the traditional emphasis on intelligence is maintained, 
that it appears to be as nearly adequate as any book of its kind yet to 
appear, but that, awaiting the publication of a more satisfactory general 
treatment of testing, instructors will still be obliged to supplement the 
textbook liberally. 

Bert R. SAPPENFIELD. 

Montana State University. 


BAUMGARTEN-TRAMER, FRANZISKA. Der Rorschach-Test im Lichte der 
experimentellen Psychologie (The Rorschach test in the light of ex- 
perimental psychology). Archivio di Psicologia Neurologia e Psichia- 
tria, 1946, Vol. VIII, fasc. II (Milano). Pp. 37. 


European psychologists are reluctant to use the Rorschach test 
because of its lack of standardized techniques, in spite of its valuable 
personalistic approach to the study of personality. 

The symptomatic value of the method as one based on individual 
characteristics of perceptual attitude is held to be impaired by the fact 
that the general laws of perception, particularly those of Gestalt 
psychology, have been ignored. Neglect of the content in the evalua- 
tion of the subject’s response eliminates an important personal char- 
acteristic. Further, the author holds that the test has not been validated 
for normal subjects and that no developmental norms are available. 
There are no definite standards for objective scoring of the responses 
and the personalistic interpretation depends largely on the experi- 
menter’s intuition. In administering the test the factors of illumination 
and the subject’s visual acuity and reaction time are disregarded. 

In making these criticisms the author refers exclusively to Ror- 
schach’s original book, Psychodiagnostik, without considering the work 
done in this country during the last two decades. American Rorschach 
workers would fully endorse the need for far more research. But they 
would also point to their serious attempts to make scoring more objec- 
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tive and to satisfactory results in validating the test against other 
criteria of normal and abnormal personality organization. 
ANNELIES A. ROSE. 
Smith College. 


CRAWFORD, A. B., & BuRNHAM, P. S. Forecasting college achievement: a 
survey of aptitude tests for higher education. New Haven: Yale Univ. 
Press, 1946. Pp. xxi+291. 


This book is Part I of a three-part study in “forecasting individuals’ 
relative promise for differential fields of study’’ or in establishing 
“differential measures of relative ‘educability’ at the higher levels.” It 
emerges from a project undertaken by the senior author in 1938 for the 
American Council on Education. The seven chapters of Part I repre- 
sent essentially a survey and critical analysis of the basic problem, with 
only one chapter given over to preliminary reports of the Yale Battery 
of educational aptitude measures. Thus, without present access to 
Parts II and III, the reviewer’s task is difficult: he may agree with the 
defects and shortcomings pointed out by the authors in the works of 
others, yet he is unable to assess their success in eliminating such faults 
in their own research. 

Chapter I sets the definitions and boundaries of the problem of 
differential aptitude measurement at the college level: aptitude is 
classically defined as the “ability to acquire skill’; the origin of apti- 
tudes so defined is not to be investigated; subjects of study are assumed 
to differ in the “nature of the mental processes which they require’’; 
educational aptitude testing is under investigation, rather than indus- 
trial or professional (musical or artistic) aptitudes; variance within the 
individual is to be measured by emphasizing ‘‘ready adaptation of past 
learning to the solution of new problems”; aptitudes as defined may be 
complex configurations or combinations, rather than factorially pure 
components of mental life; they may provide gross differentiation 
toward broad educational divisions. 

Chapter II is the inevitable introduction to statistics. By some mis- 
chance of organization, Chapter VII becomes an extension of this in- 
troduction to cover familiar basic principles of test construction. It is 
doubtful if “readers with little or no statistical training’’ will achieve 
the statistical understanding contemplated by the authors when these 
two chapters were written. 

Chapter III reviews the limitations of tests of general intelligence as 
differential predictive measures, reiterating as a fact rather than an 
assumption the idea that different fields of study require ‘‘quite different 
types of thinking ability.” 

Chapter IV deals with achievement testing in its dual role of pre- 
dictor and criterion. The use of achievement tests as more reliable 
criteria of accomplishment is viewed favorably by the authors, but they 
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apparently dismiss the value of achievement tests as predictors of 
differential performances at the college level. 

Chapter V, together with appendices A and B, presents data on the 
Yale Battery, comprising seven tests, grouped to have “the following 
directional significance’’: liberal arts study, pure science and mathe- 
matics, and phases of applied science. Criteria of grade averages and of 
standard achievement tests are used for Yale undergraduates and Navy 
V-12 trainees, respectively, in establishing the validity of the Yale 
aptitude tests. To the possibly jaundiced eye of this reviewer, the 
validity coefficients are about what would be expected from the general 
literature on the prediction of college success, direct or differential, and, 
without cross-validational evidence from other populations, would ap- 
pear to occasion no special revolutionary conclusions. Nor do the test 
materials seem markedly divergent from materials currently appearing 
in commercially available instruments. 

Chapter VI is the authors’ tour-de-force, given over largely to a con- 
certed attack on Thurstone’s studies of primary mental abilities. This 
is done ostensibly ‘‘to place aptitude testing (as arbitrarily defined for 
the purposes of this volume) in its proper intermediate setting’ between 
achievement testing and “the long search for underlying mental fac- 
tors.”” But one suspects a more genially malicious purpose than this, in 
view of the authors’ detailed criticisms and felicity of phrasing. 

It is to be hoped that later parts of this investigation will be fruitful. 
Certainly the problem with which it deals is vital. But the present book 
is essentially in the stage of the usual review-of-the-literature section of 
a thesis, done at the high critical level of which the senior author is 
capable. However, the reviewer feels that within its own covers this 
volume adds little to the literature at present. The bibliography omits 
not a few items of pertinence; some of the critical comments, penetrating 
and sophisticated a few years ago, are by now assimilated into current 
procedures and knowledge; and the authors’ cursory treatment of topics 
ultimately is borne in on the reader. For example, phrases such as the 
following stud the book to the point of ultimate irritation: “ . . . cannot 


be adequately discussed here... ,” “‘ . . . space does not permit... ,” 
** . . . too involved for full discussion here ... ,”’ ‘‘ . . . scope is limited,” 
“...no attempt at full coverage... ,’’ “... the foregoing cursory 


” 


remarks... .’’ Either these topics are worth discussing or they should 
be omitted. This is not alone a matter of style; this book purports to 
deal with the problems of the organization of mental life and the predic- 
tion of behavior at a high intellectual level. Such a topic deserves the 
most careful consideration, on theoretical and practical grounds, that 
research workers can bring to it. The present volume falls short of this 
level of consideration, in the opinion of the reviewer. 
Joun G. DARLEY. 

University of Minnesota. 
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Erickson, C. E. (Ed.) A basic text for guidance workers. New York: 

Prentice-Hall, 1947. Pp. x +566. 

This book by twenty authors is designed to orient teachers towards 
guidance and to ‘furnish counselors a beginning reference.’’ Its em- 
phasis is intended to be on the various ‘‘aspects of a guidance program,’ 
and is limited to high schools. 

An introductory chapter on the role of guidance services (Erickson) 
is balanced in its conception of the roles of teachers and specialists but 
focuses on the vocational guidance services outlined in Myers’ text 
(implying, as Myers did not, that these constitute the extent of a guid- 
ance program). The careless use of terms characterizes this and other 
chapters. This chapter is followed by a treatment of basic growth con- 
cepts (C. V. Millard) which emphasizes techniques of studying growth 
at the expense of insights into the ways in which aspects of growth take 
place and affect adjustment. Diagnostic instruments (R. H. Dresher) 
are surveyed in a way which provides some idea of available tests, but 
which contains errors such as the following: personality inventories are 
called tests, and are stated to be most useful in individual counseling 
(rather than in screening); manual dexterity is treated as mechanical 
comprehension in one place and listed separately in another; the Cali- 
fornia Mental Maturity Test, which the publisher’s sales policy does 
not permit the Psychological Corporation to handle, is listed as pub- 
lished by the latter organization. These are minor errors; more funda- 
mental is the perpetuation of the perverse custom of listing tests with 
minimum publication data, thus leading the novice to think that he 
knows all he needs to know to use the tests, with no indication of such 
considerations as the much greater complexity of the Primary Mental 
Abilities Tests as compared to the Otis, and no indication that the 
Strong is a highly validated interest inventory whereas the Lee-Thorpe 
is virtually unvalidated. 

The chapter on case-study techniques (W. R. Baller) is written, not 
to teach how to make better case studies, but for college professors who 
use case studies as an instructional and testing device. Four chapters 
(S. A. Hamrin, H. B. Pepinsky, C. E. Erickson, and P. L. Dressel) deal 
with counseling. The first and third are superficial, although Hamrin 
shows commendable balance in his approach to the non-directive vs. 
directive controversy; Erickson repeats data on diagnosis and gives half 
of his 24 pages to reproduction of forms. Pepinsky’s treatment draws 
effectively on his and Bordin’s work at Minnesota; Dressel uses a “‘pre- 
sented problem,” diagnostic classification, but combines it effectively 
with the type of insight into underlying problems which characterizes 
Pepinsky’s work. 

Three chapters (F. B. Dizon, J. B. Munson, G. B. Munson and L. B 
Schloerb) deal with group methods, emphasizing occupational informa- 
tion and self-appraisal by tests and inventories. These are well covered 
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and show the need to coordinate group with individual approaches, but 
they neglect the topic of group therapy. Like many other chapters, they 
tend to emphasize administrative matters at the expense of techniques. 

Occupational surveys are treated (E. K. Wilson) in a concrete man- 
ner too rare in this and other texts. The chapter on working with home 
and community (L. J. Luker) is weak on both theory and method. Work 
experience as an orientation technique (C. A. Weber) is competently 
discussed. The treatment of placement and follow-up (L. O. Brockman 
and L. Smith) omits techniques of organizing and maintaining applicant, 
employer, and job files, is not specific on employer-contact work, and 
neglects planning job-seeking campaigns. Chapters on faculty growth 
(E. L. Harden) and organizing a guidance program (C. M. Horn) are 
realistic, the latter being especially good in its discussion of steps in 
starting programs of vocational and educational guidance. 

A final chapter on sources of information (G. E. Smith) describes 
several elementary professional books, but includes none on the psychol- 
ogy of adjustment, measurement, or counseling; the list of journals, on 
the other hand, includes such items as Psychometrika and the American 
Journal of Psychiatry. Some important errors are made in the discus- 
sion of census data (p. 476), where the small percentage of male social 
workers and teachers is interpreted as meaning that these are fields 
with little demand for men, rather than fields in which the demand for 
men cannot be met. This chapter is broader in content than its title 
suggests, dealing also with methods such as career days which belong 
in earlier chapters. 

As is often the case in symposia, this book suffers from inadequate 
coordination between authors and from insufficient editing. The authors 
do not all write as specialists in the topics they cover. They address 
themselves to different levels of sophistication, omit important topics, 
and overlap; at times one even wonders whether an author knew what 
the editor wanted him to write about and whether the editor read the 
chapter after it was written (e.g., case study techniques). The un- 
organized appendix, the incomplete reading lists, documentation, the 
listing of an address which has been changed two years (N.V.G.A., p. 
457) and the omission of another (Educational and Psychological Meas- 
urement, p. 458), the loose use of terminology as in discussing teaching 
pupils good placement procedures (p. 15) and in a topic head (p. 39) 
which refers to remedial tests for a paragraph which makes no mention 
of diagnostic tests or other techniques of determining the type and 
extent of remedial work needed—these are a sample of ways in which 
editorial work was deficient. 

Despite the intended emphasis on growth and development (p. vii 
and Ch. 2) the book fails to deal adequately withthe psychology of ad- 
justment, whether in the home, the school, the social group, or the work 
situation. It omits all consideration of sociological and economic factors. 
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It makes no mention of behavior problems or delinquency. It falls 
short, therefore, of being a “‘basic’’ text for the orientation of teachers 
to guidance, and of being a professional treatment of such specialized 
aspects of guidance as vocational and educational adjustment. 

Criticism of the shortcomings of the book should not, however, ob- 
scure its good points. The chapters on therapeutic counseling, self- 
appraisal and career courses, community occupational surveys, and the 
role of work experience will be helpful in courses in guidance for high- 
school teachers and counselors. While comparable material is available 
in journal form, the concentration is sufficient to make this book useful 
as collateral reading in courses in secondary school guidance, or even, if 
well supplemented at its weak points, as a text. Its defects result partly 
from insufficient editing, but they reflect equally the polymorphous 
state of guidance work, which has become a profession in which the 
majority of practitioners are still non-professionally trained workers. 

DONALD E. SUPER. 

Teachers College, Columbia University. 
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